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THE TESTER AND DECISION THEORY 


Our society continually confronts people with decisions for which they have 
inadequate information. It is for this reason that psychological and education- 
altests exist. Some of the problems on which tests are brought to bear are 
purely individual: the uncertainties of a boy trying to choose a career, or ofa 
young couple trying to decide whether they are suited for marriage. Equally 
numerous are the occasions when an administrator, teacher, or clinician turns 
to tests for assistance in making decisions about many people. The personnel 
manager wishes to know whom to hire; the military psychologist determines 
which men are adequately trained and ready for duty; the teacher inquires 
whether his class should be taught at a rapid or a slow pace. There is no end 
to such examples of the role of tests in decision making. 

It is therefore desirable that a theory of test construction and use consider 
how tests can best serve in making decisions. Little of present test theory, 
however, takes this view. Instead, the test is conceived as a measuring instru- 
ment, and test theory is directed primarily toward the study of accuracy of 
measurement on a continuous scale. Hull in 1928 voiced a principle that has 
been the root of nearly all work on test theory: "The ultimate purpose of using 
aptitude tests is to estimate or forecast aptitudes from test scores" (43, 

p- 268). It is this view which we propose to abandon. We acknowledge the use- 
fulness of accurate estimation -- but we maintain that the ultimate purpose of 
any personnel testing is to arrive at qualitative decisions such as those illus- 
trated in the preceding paragraph: 

The value of a test depends on many qualities in addition to its accuracy. 
Especially to be considered are the relevance of the measurement to the par- 
ticular decision being made, and the loss resulting from an erroneous deci- 
sion. Recommendations regarding the design, selection, and interpretation of 
a test must take into account the characteristics of the decisions for which the 
test will be used, since the test which is maximally effective for one decision 
will not necessarily be most effective elsewhere. 

An appropriate test theory can evolve from a general and systematic ex- 
amination of the decision problems for which tests are used, and of the de- 
miands these problems place upon the test. In such a study of decision making 
and its implications for testing, we are fortunately able to draw on extensive 


mathematical contributions, many of them recent. 


` 
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The following comment by a leading contributor to decision theory, 


M. A. Girshick, indicates the need for study of this sort and the difficulties: 


» here is a situation in which the foundational de- 
velopment, difficult as it is, is easier than the application to actual 
problems. ... 


n statistics. 
cisions be tal 


i mainly in [the collection of alternative acti 
be taken] .. , which in one case is finite and in the other i; 
m attempt to clarify the natur i 


its greatest generality, 
mediately placed in 


a general framework 
tions anı 


d implications. 


, and is being used to some ex- 


» and sociologists (see 65). Within Psychology 
t to bear on Problems r. i 


ns when he is one of a 
The resulting 
(67) proved to have great i 


"Theory of Games" 
terest not only for economists, 


but also for mili- 
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The work of Wald, who extended the statistical theory of testing hypothe- 
ses into a general "statistical decision theory" (69), paralleled and has merged 
with game theory. Wald's interest was partly instigated by problems of inspec- 
tion and quality control in industrial production, where decisions to accept or 
reject manufactured articles are required. (Such inspection has an obvious 
Similarity to one variety of personnel testing.) Both the statistician and the 
game theorist are concerned with decisions in the face of an uncertain future, 
but in game theory the uncertainty comes from the prospective selfish actions 
of competitors, while in statistical decisions the uncertainty comes from ran- 
dom variation in events. 

Studies of decisions take into account the risk, benefit, or "utility" of vari- 
ous courses of action. Much thought has been given to problems of defining or 
estimating such utilities. Economists and mathematicians have tried, for ex- 
ample, to estimate what hypothetical utilities, attached to various consequen- 
ces, could account for a person's decision to gamble even though he knows 
that the "house percentage" prevents his winning over the long term. Experi- 
mental psychologists have recently begun to study how people actually choose 
between courses of action, that is, what utilities are consistent with their ac- 
tions (65). 

Even where it is possible to determine what benefit each individual will re- 
ceive from various courses of action, the best decision is hard to select if the 
interests of many individuals are at stake as, for example, in economic deci- 
sions of government (4). One might assess the total effect of a decision by 
combining the benefits received by all individuals, but this can scarcely be 
done unless we can measure all persons! utilities on the same scale. Is the 
benefit rendered by reducing a rich man's taxes by $100 equal to the benefit 
when a poor man's taxes are reduced by $100? A whole body of theory referred 
to as "welfare economics" has tried to resolve dilemmas encountered in bal- 
ancing the welfare of many persons affected by a decision. 

A tremendous amount of knowledge has been developed around the general 
topic of decision problems. Since the tester is concerned with decision mak- 
ing, it is reasonable to expect significant understandings to result from restat- 


ing testing problems in such a way that this knowledge can be brought to bear. 


CHARACTER OF THIS REPORT 


Our study did not begin as an attempt to translate utility and decision the- 
ories into psychometric terms. Rather, it began with recognition of certain 
questions about tests which we could not answer adequately under then-exist- 
ing formulations. One question, for instance, was how one might evaluate a 
test battery which measures several aptitude dimensions. Hitherto, testers 
have discussed only the validity of each single score or of the combined test 


against any single criterion. Except for Horst's very recent work (40), no 
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effort has been made to consider the total contribution of the test in making 
decisions over all criteria. In trying to formulate theoretical models which 
would permit study of such questions, we were led inevitably to examine the 
utility of personnel decisions. It was at this point that we recognized our 
approach as a case within Wald's more general decision model. This com- 
prehensive model, we found, clarified a wide variety of testing problems. 
We have not drawn all possible inferences from decision theory. Indeed, 
we barely touch on some issues which patently lead to deeper and deeper in- 
quiries. Furthermore, decision theory has been growing at a rate which has 
quite prevented us from exploiting the significance for the tester of new de- 
velopments. There is no reason to think that this flow of new materials will 
abate. The majority of the recent publications belong to the higher strato- 
spheres of mathematics, and are to be comprehended only dimly by non- 
mathematicians. We have confined our own stud 


y to the simplest and most 
definite mathematical methods. 


No doubt some person with greater mathe- 
matical competence than ours can extract from this lit 


erature much further 
material for the tester. 


cedures are, 
At a more technical level, this re 


port examines specific principles of test 
development and use. Some of the g 


, new 


Others are new 


erpret these papers, or integrate them 
Since the topics include efficient design of 
batteries, interpretation of validity coefficients ’ 
and use of tests for individual assessment, we anticipate that the results will 
concern a large and diverse audience of tes 


The reasoning of this report is mathe: 


t users, 
matical rather than empirical. A 
ges and hazards. The advantages 


ms can be stated, the finality with 


» and the wide range of circum 
a derivation can apply. In contrast 


which they can be established stances to which 


+ an empirical study covers only a few par- 


ticular circumstances and obtains results which are perturbed by various 


sorts of sampling error, 


The disadvantage of the mathematical at 


tack is that it involves assump- 
tions which may not adequately describe re 


al conditions. 


At times it is even 
necessary to make assumptions about postulated variable 


sS which have never 


E 


THE TESTER AND DECISION THEORY 5 


In our model, for example, it is assumed that the contribu- 
tion of a person to an institution, and the costs of testing, can be evaluated 
gible, countable unit. Girshick's comments on this difficulty are 


been observed. 


in some tan 
worth quoting: 


Here again we see that decision theory demands a great deal of the 
decision maker. It demands that he be in a position to evaluate numeri- 
cally for every possible state of nature in the situation under considera- 
tion the consequences of any of the actions he might take. It has been 
argued by many that no human being possesses the ability so to evaluate 
the utility of the various actions in all possible states. . .. The inability 
of the decision maker to formulate clearly the loss function is, in fact, 

a stumbling block in determining what a rational mode of behavior is 
for him. More bluntly, it is impossible to tell a person what is an opti- 
mal way for him to behave if he is unable to formulate clearly what he 
is after.... Decision theory acts as a gadfly to the research worker. 
It says to him: You cannot solve your problem unless you more clearly 
define your goal and the consequences of your decisions. Such a prod- 
ding is likely to be healthy. (36, p. 463) 

Possibly a major contribution of the approach through decision theory is 
that it points clearly toward a variety of needed empirical studies. Thus our 
argument indicates that many present personnel procedures are based on 
assumptions regarding the interaction of the characteristics of the individual 
e treatment to which he is assigned. These implicit, widely 
have been tested sketchily if at all. A similar finding 


lacement of individuals (among various levels of instruc- 


and the nature of thi 
employed assumptions 


is that tests used for p. 


tion, for example) should be validated in ways quite different from those in 


general use. 
To provide the general reader with an overall grasp of concepts and results, 
the main body of this report contains a minimum of technical detail and mathe- 


matical reasoning. The detailed mathematical argument is found in a series 


of technical appendices. 
This report does not attempt to present decision theory per se. A mathe- 


matical presentation of decision theory may be found in Theory of Games and 


Statistical Decisions, by Blackwell and Girshick (6). The latter author has 


presented an elementary survey of basic concepts in the field (36), and a pop- 


ularized treatment is offered in Bross' Design for Decision (15). The subject 
of utility analysis has been reviewed by Adams (1) and Edwards (31), the lat- 
ter review being particularly directed toward psychologists. 

Decision theory is provocative, and forces one to alter his accustomed 
thought patterns. We anticipate that each reader will find a different facet of 


our argument important for him, and often his thoughts will veer off into path- 


ways not covered by this investigation. The intended contribution of this mono- 


graph is, in a word, to stir up the reader's thoughts. 


TYPES OF PERSONNEL DECISIONS 


Any situation where @ person is confronted with alternative courses of 


A test theory should encom: 
all the various decisions for wh; 


testing 


action is a decision problem. pass in some manner 


nt demands upon tests, and 
Indeed, it could be said that 

h type of decision, For theo- 
retical purposes, however, it i 


to separate guidance from 


which classify p 


characteristics Significant for decision theory. 


Institutional decisions Such as s 


election of employe. 


hoice of a vocation, 
a single person mak 


sions. The "individual" decision is 


es may be distinguish d 
from individual decisions such as ¢ Berens: 


In the typical "į 
tional" decision, 


es a large number 


f ice confronting the 
decision maker will rarely or never recur. 

Examples of institutional decisions are 
screening of pupils to identify who should bı 
mination of appropriate therapy for Psycho! 


decisions are made regarding many People 


Classification of mi 
e studied by counse 
tic Patients. 


litary recruits, 


lors, and deter- 


value system. The hospital wishes to use i 
ciently; the industry wishes to choose empl 


tribute most to its balance sheet. 
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be taken into account, but only insofar as they affect the realization of the goals 
of the institution. From the viewpoint of an admissions officer, the best use of 
space in a medical school is to admit those who have the greatest probability 
of success, considering both motivation and ability. Between two candidates, 
the institutional value system would require impartial choice of the one with 
the better prognosis. a 
In an individual decision, the best course of action depends on the individ- 
ual's value system and varies from one individual to another. A particular 
goal which would be worth any risk to one individual may have little value to 
another. Thus one boy applying for admission to medical school might value 


highly even a small chance of success, and would regard a shift from medicine 


into teaching or pharmacy as abandoning all his aspirations; another having 


comparable prospects of success might be contented or even relieved to aban- 


don a medical career. Probability of success cannot be the sole consideration 
in the individual decision. 

In the institutional decision, we may think of the decision maker as trying 
to maximize the benefit from a whole series of similar decisions. That is, he 
seeks a policy which will work best "on the average" over many decisions 
about admission or job assignment or therapy. Since each decision involves 


the same set of values, he can combine different decisions and strike some 


type of statistical balance which gives the best overall outcome. 

The individual decision is often unique. The choice may occur only once 
in a lifetime. Even where the decision can be "remade" at a later time if the 
first course of action works out badly, the original decision has an uncancellable 
influence on the welfare of the individual. A poor choice of curriculum at the 
outset of a student's college education will continue to handicap him long after 
he has discovered his error and changed to a more suitable curriculum. It is 
meaningless to speak of "averaging his risk", since he makes only a few such 
decisions at most. 

The insurance business is founded on just this distinction between institu- 
tional (or collective) decisions and individual decisions. The company can 
pool its decisions; it charges a premium such that the company's loss over 
all risks underwritten will be balanced by the totalled premiums. The premium 
is further increased by margins for safety, for administrative expense, and 
for profit. The purchaser is involved in only a few possible insurance trans- 
actions. In each, he decides to insure or not insure -- i.e., to incur a small 
and certain loss, or to risk a large loss at a low level of probability. If he 
insures, his premium is larger than his pro rata share of the risk. A man 
considering a large number of insurance transactions would find this course 
of action uneconomical because he loses on the average. He might more profit- 
ably make other financial provisions for covering possible losses. The man 
considering only a few insurance policies cannot expect that the outcomes for 
him will come close to the statistical average. He must evaluate the discrete 


alternative risks according to his personal scale of values. 
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Test theory, as it now stands, is relevant chiefly to institutional decisions. 
Regression formulas, for example 
Squared error of estimate, 

mathematical and Statistical 
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problems associated with them, are under active study by many persons at 
present. One reason for this interest, apart from the purely mathematical 
attractiveness of the problems, is that ordinal utilities have been found a 
superior basis for economic theory (1). In psychology, Coombs (22) has 
pointed out that the investigator who restricts himself to cardinal assumptions 
is unable to attack certain problems adequately. These include probability 
learning, level of aspiration, trouble-shooting, preference among rewards or 
among wagers, etc. Ordinal models are being used in some current research 
on these topics. Out of this work may come practical methods for dealing 
with ordinal utilities, and important insights for test theory. In our opinion, 


however, such developments will bear primarily on individual decisions rather 


than institutional decisions. 


Assumption of a priori knowledge 


An investigator who wishes to maximize his expectations must have a firm 


basis for these expectations. An insurance company can successfully predict 
the net outcome of its wagers solely because of its accumulated experience 
with the distribution of losses for individuals grouped on the basis of age, 
occupation, state of health, etc. Knowledge of the expected distribution of out- 
comes (with and without information from testing) is likewise necessary in 
personnel decisions. Approximate distributions are ordinarily obtained by 
administering the test to a sizeable sample of indiyiduals, and using the distri- 
butions thus found as a basis for future policy. It is necessary to assume, of 


course, that the relevant features of the situation remain stable. 


Two-person games 

Much of the effort in decision theory and the theory of games has been to 
avoid assumptions such as those made above. von Neumann and Morgenstern 
(67) point out that many decision problems take the form of a "zero-sum two- 
person game", in which each ‘person tries to increase his payoff at the expense 
of the other. The player cannot depend on statistical experience to predict 
his opponent's move. When one player knows what move the other is going to 
make, he may be able to take advantage of this knowledge. A course of action 
which could have great value "if all goes well" is not a sound choice if it leaves 
a serious weakness for the opponent to capitalize on. 

In such a case, it has been suggested that instead of trying to maximize ex- 
pected payoff it might be better to follow a "minimax" principle. The minimax 
principle is to select always that course of action which will yield the least 
loss when one's opponent makes the least favorable response. The minimax 
principle is highly conservative, embodying the view that "if anything can go 
wrong, it will". Wald (69) has pointed out that statistical decisions can be 
described as two-person games, where the decision maker is playing against 
Nature. The decision maker tries to guess (from partial information) what 


conditions exist in the particular situation that confronts him; Nature has chosen 
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Figure 1, a classification Procedure 
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discriminant problems involving no quotas, as well as classification to fill 
quotas. Information is used as an aid to classification; this information may 
be continuous or qualitative, and may involve one or several dimensions. 
Figure l illustrates schematically the general nature of the classification 
problem. 

It is helpful to consider separately two special cases of classification 


which are much simpler to analyze than the general problem. The first case 


ASSIGN TO ASSIGN, TO: ASSON 107 
TREATMENT ATMENT ee 


c 


SCORE y, 


SCORE y, SCORE y, 


Figure 2. Placement and selection procedures 


is where information is univariate. Even when information is obtained in 
terms of more than one score or dimension, it is a common practice to com- 
bine such multivariate information into a single composite score before making 
decisions. If scores on the same composite scale are used in making all deci- 
sions between treatments, the classification problem may be termed a place- 
ment problem (Figure 2). The most common examples are dividing students 
among sections to be taught at different rates, or using a trade test for coarse 
grouping of applicants. Use of a composite score to discriminate the brain- 
‘injured from the mentally deficient also exemplifies a placement decision. We 
shall see later that "measurement" problems can be considered as a particu- 
lar variant of the use of tests for placement decisions. 

Problems may also be differentiated according to whether or not rejection 
is allowed as one possible treatment, so that the person is eliminated from 
the institution. We can refer to these as selection problems; Figure 2 illus- 
trates a selection strategy based on a non-linear combination of two scores. 
We shall give particular attention to the relatively simple problem of selec- 
tion on the basis of univariate information, where all accepted men are treated 


alike. 


Constraints upon decisions 

Decision problems, whether concerned with classification, placement, or 
selection, may be distinguished on another basis: the presence or absence of 
certain constraints. Two types of constraint are common: (a) number of 
treatments per man, and (b) number of men per treatment. The great majority 
of institutional decisions call for one treatment per man. The man is sent to 


one training course or to one job, or given one diagnosis. In individual decisions 
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within his field. Choice points where he altered his plan may be readily recog- 
nized; there are also less conspicuous points where he considered a change of 
goal but did not make one. The institution is likewise making a sequence of 
decisions about the student's acceptability in the chosen career. Each time 
grades are given or he is admitted to a new level of training, the institution 
chooses between eliminating him and "investigating further". The college 
admissions office may view a decision to admit a borderline student as ending 
the decision process; but from the perspective of a dean considering the student's 
poor marks a year later, it was a decision to gather further information by 
observing his college performance. 

After a test is given, one either can assign the person to a treatment or 
can decide to administer further tests. Broadly speaking, those further tests 
may include job tryouts, education and training programs, or even initiating 
psychotherapy and observing the patient's response. A distinction between 
"tests" and "treatments" is a convenience, but one of doubtful logical justifi- 
cation. Choice of any treatment which does not remove the individual from 
the hands of the decision maker is to some extent an "investigatory decision", 
since new facts can always be used to modify an earlier plan. A decision- 
making procedure which permits information at one point to determine what 
information will be gathered next is called a sequential strategy. It will prove 
important and profitable to extend test theory to encompass sequential methods. 


A TAXONOMY OF DECISION PROBLEMS 


The foregoing distinctions suggest a possible taxonomy for decision prob- 
lems. Problems similarly classified are describable within a set of general 
principles. While a taxonomy adequately covering the entire range of testing 
problems would be hopelessly complicated, a small effort in this direction 
will serve to summarize the present chapter and to indicate how broad is the 
domain that concerns the tester. This will make particularly clear the fact 
that subsequent chapters barely begin the needed exploration of personnel 
decisions. 

To classify any decision problem, we ask these questions: 

1. (a) Are the benefits obtained from a decision evaluated in the same 
way for each person? or 
(b) Are different values used in deciding about each person? 

2. (a) Is the decision about each individual made independently? or 
(b) Are decisions about various persons interrelated? 

3. (a) Is each individual assigned to just one of the available treatments 
ments? oF 
(b) May he be assigned to multiple treatments? 
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4. (a) Is one of the allowable treatments "reject"? or 
(b) Are all persons retained in the institution? 
5. (a) Is the information used in univariate form? or 
(b) Multivariate form? 
6. (a) Are decisions final? or 
(b) May one decide to obtain further information prior to final deci- 


sions? 


These six questions define 26 = 64 different patterns. 


While all of them might 
be confusing rather than clarifying. For this reason we shall discuss relatively 
specific problems within the 64 possibilities., 


We can use a code pattern such as aaaaba to describe any decision problem 
in terms of the six questions. 
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present special characteristics. Some of these patterns (chiefly aaaaaa) have 
been treated in scattered papers, but only Brogden has done connected studies 
on them. We are forced to conclude that there is no "theory of mental tests" 
at the present time, although there are many fragmentary theories. 

The present report shows some connections and differences between the 
various systems, and may therefore be regarded as a first step toward an 
integrated theory. There will probably always be a need for many alternative 
test theories covering different cases, however. While an abstract mathema~ 
tical treatment can cover all or most of the range of personnel problems 
simultaneously, definite treatment of specific problems will usually be more 
comprehensible to the test user. The purpose of the overriding system is to 
make clear the differences among problems, 


theory will not lead to incorrect practices. 


so that misapplication of a sub- 
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There is, in the first place, an ind 
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Figure 3. Schematic view of a decision Process 
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STRATEGIES 


A strategy (or decision function) is a rule for arriving at decisions. A 


strategy must state what the decision maker will do in any possible contingency 
(cf. 36). Thus a rule for college admission may say, "Any graduate of an 
accredited high school will be admitted; other applicants will be expected to 


pass certain tests of General Educational Development". This is equivalent 
to the following formal strategy: 
Given information that individual iis a graduate, make the terminal 
decision "accept" with probability l. 
Given information that i is not a graduate, make the investigatory deci- 
sion "Give GED tests" with probability 1. 


At the second stage, given the further information that i's score on GED 


is Pass, make the terminal decision "accept" with probability 1. Given 


information that i's score on GED is not Pass, make the terminal deci- 


sion "reject" with probability 1. 


A strategy consists of a set of conditional probabilities. Probabilities em- 


ployed in a strategy need not be restricted to 1 or zero. Given certain infor- 


mation about the individual, the probability of each decision is specified. The 


probabilities that constitute a strategy can be written as a matrix. Strategy 


matrix 1 describes each stage of the admission rule stated above, and matrix 
2 combines the two stages. Every entry is of the form Ba fyi the probability 


of making decision d, given information y. Matrix 2 makes clear that a strategy 


specifies the decision to be made in each possible contingency. Under the pre- 


sent strategy such a condition as "graduate, non-pass GED" cannot arise; if 


the test were routinely given to every applicant, however, the decision rule 
would need to provide for this contingency also. 

The word strategy suggests that a conscious policy guides decisions. Even 
if choices are based on habit or some chance mechanism rather than policy, 


they can be described in a matrix. Each entry states what proportion of the 


Decision 
Information Category (y) 
Accept Reject Continue Testing 

First Stage: 

Graduate 0 

Non-Graduate 0 a 
Second Stage: 

Non-Graduate, Pass GED 1 0 


Non-Graduate, Non-pass GED 0 


Strategy Matrix 1. Plan for two stages of decision regarding 


college admission 
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Decision 
Information 
Category (y) 


Accept Reject Give GED, Give GED, Give GED, 


Accept Reject coer 
Graduate 1 0 0 (0) 0 
Ban EED E o o 1 0 i 
A e oy é ‘ i $ 


Strategy Matrix 2. Same plan, combined form 


i the decision maker makes each choice, under each set of possible condi- 
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ions (information). The matrix describing a decision maker! 
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decisions. 
recorded will often be other than 1 or 0. For example 


» Suppose we examine 
recommendations of an interviewer for @ private college where all applicants 
take a certain scholastic aptitude test. a tabulation of his decisions might 
report his decision pattern or "strategy" 


as this matrix; 


Decision 
Accept Reject 
High-school graduate, SAT above 70 +80 .20 
Not high-school graduate, SAT above 70 -80 .20 
High-school graduate, SAT below 70 .20 80 
Not high-school graduate, SAT below 70 -10 .90 


Every entry is a conditional probability. This interviewer evide 
ve 


ntly gives 
much more weight to SAT than to the fact of graduation, The table tells us 
that he accepts some graduates below 70; we do not know whethe 
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once a way of evaluating outcomes is agreed on, the benefit from various 
strategies can be compared. If the decision maker aims to maximize expected 
utility, it is always better to assign persons in the same information category 
to the same treatment than to distribute them without further information. 
Sometimes a quota forces one to divide persons with the same observed charac- 
teristics, but otherwise in an ideal strategy matrix for institutional decisions 
the entries will always be 1 or 0. 

Tabulation of actual strategy matrices reveals ways in which the decision- 
making practice departs from the ideal. Within a given category, the decision 
maker often differentiates, for example recommending different treatments 
for persons whose score patterns are negligibly different. He may believe 
that this differentiation is based on additional information not represented in 
ample, impressions gained from interviewing. Unless such 
the judge who differentiates within 


the scores, for ex 


added information has considerable validity, 


an objectively defined group may actually impair the utility of his decisions 


by introducing random variation (26; 53, P- 117). Once systematic errors in 


strategy are identified, they can be corrected by retraining. 


EVALUATION OF THE DECISION-MAKING PROCEDURE 


In evaluating a strategy for making decisions, one asks such questions as 


these: 
l. Does this procedure arrive at the best decisions possible, with this 
body of information? 
2. Would gathering some other (or additional) information permit 
better decisions? 
3. How much difference is there in the goodness of decisions arvived 
at by any two procedures? 
Evaluation of a strategy involves evaluation of possible outcomes and the pre- 


diction of possible outcomes; these require separate consideration. First, it 


is necessary to state what value the decision maker places on each possible 


Then if the outcomes from a given strategy can be predicted, it is 


utility of the strategy. 


outcome. 
possible to evaluate the 


Outcomes 


The outcome consists of all the consequences of a given decision which 
concern the person making the decision (or the institution he represents). 
What outcome will result when the chosen treatment is applied depends on the 


characteristics of the individual and ordinarily on further unspecified situa- 
tional variables. For example, the outcome of a decision to accept a student 
ends on unmeasured motiva 


tors he selects on the other. 


Ba tional factors on the one hand, and on the par- 


ticular courses and instruc 
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systematic comparison would establish 


the best assignment. Having only 
fallible information, 


can at best predict the Probability 
» OF state the expected outcome over many similar 


the decision maker 
distribution of outcomes 


decisions. 


Prediction of outcomes 
=n of outcomes 


industry, outcomes of hiring include the man's hourly Production, his spoilage, 


his length of stay with the company, his effect on the morale and tenure of 


other employees, etc. 
Empirical results from Previous cases 
information (y) is related to the criterion 


validity matrix. There is one validity mat 


are required to determine how the 
(c). This experience provides a 
rix for each treatment. 


In general 
obabilities Po/ 


form, it consists of a set of conditional pr The criterion 
yt" 


Information Criterion States (c) 
Categories (y) 1 = 
3 
I 
Piste P2/1t P3 /it 
2 
Pi/at P2/2t P3 /o¢ 
2 
Pi/st P2/3t P33 


Validity matrix for treatment t 


states may be simple judgments (successful, 


term: 

In our comp: i sof 
if the operational criterion is multidimensional it can be reduced 
dimension by the application of appropriate weight: 


S. The validity 
takes the form of a multivariate distribution of test and criterion 


toa single 
matrix then 


8Cores, 
Valuation of outcomes 
SSS ot outcomes 


Each information category leads to a different expected distributio, E 
n oj “= 
comes for a particular treatment. To compare alternative Strategies í out 
: `. . or 
assigning individuals ina particular information cate 


Bory toa treatment 
must judge the desirability of each possible outcome. 


» one 
To evaluate outcome 
à mate i ; a 
cardinally requires assigning a value €s to each possible criterion Score We 
shall assume here that such evaluations are expresse: 


d on an interya) Scale of 

P i es 
utilities, but they may vary with c in any manner. Ta 

e, 


Produ, 


To take a Specific e; 


consider a criterion scale reporting the Proportion of perfect objects 


ced 


CHARACTERISTICS OF DECISION PROBLEMS 21 


by each machine operator. The higher this quality measure, the more valuable 
the operator is -- but the benefit may not be a linear function. At the extreme 
upper end, a slight rise in quality of output may permit elimination of routine 
inspection. If so, a 2% rise in quality from 94% to 96% would represent a 
great gain. Where the quality is low, on the other hand, complete inspection 

is required, and the operator whose score is 76% yields negligibly more bene- 
fit than one at 74%. Practical difficulties of assigning values to outcomes will 


be discussed in Chapter X. 
The evaluated outcome may be referred to as a payoff. We can determine 


the expected payoff from an individual in a particular category y; by weighting 
the value (e) of each outcome by its probability (Po/y t) and summing over 
i 


all outcomes. The cost of a test or any other procedure for gathering infor- 


mation must be expressed in utility units and deducted from the expected 


payoff. 


Expected utility from a strategy 


As we have formulated the problem, a strategy is only to be evaluated by 
its total contribution when applied to a large number of decisions. "Expected 
Payoff from an individual" has little meaning unless one can average actual 
{f individuals in the same information category. 


payoffs over a large number oi 
eveloped here does not yield results interpretable 


For this reason our model as d 


with respect to individual decisions. 
If we know the distribution of scores Yi in the population tested, the expected 


net utility for a large number of decisions is determined simply by adding the 
expected payoff for each y,, weighted by the probability of that score. This 


can be stated algebraically, using the following symbols: 


= utility of the set of decisions 


U = 
N = the number of persons about whom decisions are made 
y = information category 
t = treatment 
c = outcome 
€, = value of outcome 
cy = cost of gathering information 

aN : Py = Pe/y 7 Peya N z PyCy a) 


The By describe the assumed y distribution, the Pe/y are the entries of the 
strategy matrix, and the Po/yt are the entries of the validity matrix. Cy may 
or may not vary from score to score. Whatever strategy gives the greatest 


value of U is to be preferred. 
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Little empirical information regarding payoff functions is presently avail- 
able. For any single treatment, difficulties of prediction rarely permit us to 
fit any function more complex than a linear regression formula. Conventional 
comparisons of treatments show mean differences between payoff under two 
production methods or two instructional techniques. These results indicate 
only that one treatment yields higher outcomes for persons who are on the 
average like the experimental subjects, and do not yield payoff functions. Such 
an experiment would indicate, regarding Figure 4, that treatment B is best. 
Significant interactions between ability and treatment are reported in some 
studies, but the data are generally inadequate to plot payoff functions. The 
scattered experimental evidence (see 2, for example) warrants the belief 
that payoff functions for different treatments will differ in slope as well as in 
mean value. 

Even more to the point is the fact that use of tests for educational place- 
ment implicitly assumes the existence of such different payoff functions. Our 
model therefore is merely a formal statement of what is everyday accepted 
without question, and examining its implications for testing is undoubtedly 
important. The realization that this assumption underlies practice is in itself 
an advance, because it makes clear the need for research on payoff functions. 
But let us first defend the claim that this assumption is currently made. 
sis of some univariate information and assigning 


Dividing a group on the ba 
t must be justified by the belief that greater 


each segment to a different treatment 
Payoff will be obtained in this manner. With two treatments, there are three 


Possibilities. ğ 


1. “he payoff functions are identical, in which case placement is value- 
less since the treatment to which a man is assigned makes no dif- 


ference in payoff. 
The payoff function for one treatment is uniformly higher than the 


other for all score levels. In this event it is unwise to divide the 
group. All persons should be given the superior treatment unless 
institutional constraints require both treatments to be used (for 
example, where there are too few therapists to give psychotherapy 
to all patients). 

3. The payoff functions intersect somewhere within the score range. If 
this is the case, as in Figure 4, it is profitable to assign men to dif- 
ferent treatments, so that each receives the treatment which "gets 


the most out of him". 


Dividing students into sections to receive instruction at different rates 
clearly assumes the existence of condition 3, for there is no administrative 
necessity to vary the instructional pace from section to section. 

The assumption that payoff functions do intersect, making placement prof- 


itable, is consistent with available theories about instruction. A person who 
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lacks readiness to profit from one experience may be able to learn from 
another. Theories about response to therapy and about job performance like- 
wise support an expectation that payoff functions will intersect. We therefore 
anticipate that empirically determined payoff functions will often be like those 
of Figure 4. Accepting the concept of intersecting payoff functions will allow x 
us (in Chapter VI) to raise fundamental questions regarding the construction 


and validation of placement tests. We shall also use the concept as a base for 


studying certain selection problems. 


Multivariate response surfaces 


The problem of the tester may be further clarified if we think of the pay- 
off function as a response surface (9) in multidimensional space. Under any 
one treatment, the outcome for a given person might be presumed to 


predictable, if enough is known about his characteristics and the trea! 
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be exactly 
tment is 

some variance will always be unpredicted 
therefore one can only describe the expected payoff for a person havin, 
tain characteristics. 


» and 
g cer- 
It is convenient to think of the measured characteristics 


as expressed in terms of.orthogonal factors a; = Sj» 82+... Any test score 


may be resolved by factor analysis into a linear composite of certain S dimen- 


sions. The person's characteristics are described by the pattern a. For 


t The payoff function is a re- 
sponse surface continuous in the a A different surface is expec: 


each pattern EA there is an expected payoff ej 


ted for each 
, = * 
treatment. + 


è s 
y with a subset of 
For any set of treatments 


ored as to extract those 


In any decision problem, we are actually concerned chief. 
the various s's, those which are related to payoff. 
under consideration, the test scores may be so fact, 


dimensions which account for variance in payoff under at least one treatment. 


Payoff under any treatment depends on one or more of these Aptitude dimen- 
sions. Any other of the test-score dimensions wil] have zer, 


© Coefficients in 
payoff functions for all the treatments. Our argument to this 


Point has empha- 
sized that one may choose that one of several distinct treatm, 


ents for which 


ew igi® greatest, as estimated for an individual from whateve 
S: 
1 


tion is available. 


T test informa- 


Quite a different response surface may be considered if we regard treat 
at- 
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the decision maker is confined to a few alternatives. But in principle the 
treatments given a student may vary ina continuous manner. Such variables 


in an instructional situation may include the amount of supportiveness of the 


instructor, the pace of instruction, or the extent to which the instruction de- 


mands mathematical ability. In psychotherapy, the treatment may vary in 
warmth, depth of interpretation, and extent of medical supplementation. If 
the concept that treatments vary continuously is accepted, for individuals 
having the characteristics G there is a response surface ey = £(®,)- This 
surface presumably has an absolute maximum (and perhaps local maxima). 
If the relation of payoff to the treatment parameters were independent of 
the person's characteristics, then such parameters would be of little interest 


to the personnel tester. The same treatment would be best for all individuals. 


among individual characteristics and 
r different 


There is often an interaction, however, 
treatments, so that the location of the maximum payoff differs fo: 
patterns 8. 


The personnel worker is usually conce 
In clinical and guidance testing, however, it is the indi- 


and the treatment is to be selected from many possi- 


rned with choosing individuals to 


fit a fixed treatment. 


vidual that has been fixed 
bilities. The task of the clinician appears to be to adjust the treatment as 
nearly as possible to the optimum for the individual. He can ordinarily use 


his information to select from among an almost infinite variety of treatment 


conditions that one pattern which seems best for the person. It is true that 
of information are a Pgor basis for making such adaptations, 
but this indicates only | that gains from adaptation are likely to be limited at 
present. Certainly the physician does not hesitate to compound drugs, rest, ` 
diet, and bedside encouragement in an individualized formula. The 


exercise, 
same flexibility is ultimately open to the teacher and the psychotherapist, 
but their supporting science is less adequate at this time. 

The response surface linking payoff to treatment factors has been ignored 
in test theory because it plays little part in traditional personnel decisions. 
Assigning men to fixed categories, or predicting their scores under a single 
treatment, is all that the industrial and military psychologist has attempted. 
But even in personnel classification adaptation may be possible. One may = 
vary such important conditions of a job as amount of on-the-job instruction, 
amount of supervision, and pacing of work. Introducing radical changes in 
degree of responsibility or amount of automatic control makes changes in pay- 
off even more likely. So long as one can expect to employ men of a given 
quality, one should set the treatment so as to maximize their payoff. Within 
the limits of practicality, a change in quality of men calls for adaptation of 
treatment. Thus in all personnel testing there may be need to consider pay- 
off as a function of treatment variables. 

Considering both types of variables affecting payoff, we see that eit 
Since both s and w are (in principle) multivariate, the problem cannot be 


easily represented geometrically. Algebraic analysis is complex but not 
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SELECTION DECISIONS WITH SINGLE-STAGE TESTING 


Chapter IV examines the most familiar of personnel decisions, selection 
on the basis of a single test or composite score. The most common examples 
are industrial selection, admission of students, and psychiatric screening of 
recruits. Typically, a constraint is placed on the number or the minimum 
quality to be accepted. 

A discussion of selection decisions at this point has many advantages, even 
though not all the results to be presented are new. Stating a familiar problem 
in terms of our model will clarify utility analysis, and lay the groundwork for 


the consideration of other personnel decisions. 


THE INTERPRETATION OF VALIDITY COEFFICIENTS: EARLIER WORK 


Evaluating the benefit obtained from tests is of considerable practical in- 
terest. Many of the studies have been stimulated by the "public relations" 
problem of convincing business management or military authorities that the 
benefits from testing programs justify their cost. Professional workers, in 
planning testing programs, must decide which tests to use, and how many; and 
this requires balancing costs against estimated benefits. 

The contribution of a testing program depends on the importance of the deci- 
sions to be made, the selection ratio, and possibly other characteristics of the 
situation. The situation being fixed, the cost and validity of the test determine 
whether it should be used and how much is gained by using it. Thus assessing 
the value of the testing program requires a study of the benefits associated 
with any level of test validity. 

The relation of benefit to validity has long been regarded as an important 
question. Interpretation of validity coefficients is an important topic in the 
training of test users, as is evidenced by its ubiquitous appearance in profes- 
sional texts on testing and also in literature on testing aimed at a more general 


audience. As at least four different formulas for interpreting the validity 
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test is linearly related to the validity, regardless of the selection ratio. He 
argues that a test of validity .50 gives 50% of the improvement that would 
result from using a perfect test. This result is contrasted with an interpreta- 
tion based on the coefficient of forecasting efficiency which Brogden quotes 
from an outstanding text on measurement published in 1936: "Tests with a 
coefficient of validity less than .5 are practically useless except in distin- 
guishing between extreme cases." The linear relation has also been discussed 
by Cochran (20), and in modified form by Richardson (55), Jarrett (45), and 
Brown and Ghiselli (16). 

The Brogden generalization, like that of Taylor and Russell, encourages 
far more extensive use of tests than do the interpretations in terms of r? or 
E. Wesman (70) rationalizes this seeming discrepancy by pointing out that 
an error of prediction regarding a superior man who is accepted does no harm. 


So long as it is wise to hire both A and B, it does not matter that the test 


overestimates A's production, and underestimates B's. Only errors which 


"cross the borderline" so that a man is hired who should not be, or vice versa, 


impair the utility of the decisions. The analyses disagree because the coef- 


| ficients E and r? assign evaluations differently than do the more recent 


analyses. In computing E, the loss to the institution is regarded as propor- 
tional to the discrepancy between predicted and actual performance; and in 
the coefficient of determination, as proportional to the square of this discrep- 


ancy. But a man's contribution in a particular setting depends on his perfor- 


mance, not on the prediction made about him. Errors in prediction are costly 


only when they cause errors in decision. This is recognized by the Taylor- 


Russell treatment and, in slightly different ways, by its successors. 


The value of a test can be stated only in terms of the specific type of deci- 


sion problem, the strategy employed, the evaluation attached to the outcome, 


and the cost of testing. We shall use the utility model to derive this relation- 
ship for simple selection decisions, and later for a variety of other conditions. 
It will be increasingly clear that the linear interpretation of the validity coef- 
ficient, or some minor departure from it, is appropriate for simple selection, 
but that other indices must be chosen in other decision problems. Brogden's 


insightful statement indicates why interpretation of validity coefficients pre- 
sents so much difficulty: 


In general, it is probably true that statistical formulas are not devel- 
oped with the primary objective of providing interpretations most mean- 
ingful for a research worker having problems peculiar to a given area 
of research. The formula is more apt to be developed as an expression 
of certain mathematical relationships. In the derivation, assumptions ~~ 
often highly limiting in nature -- are introduced as necessary to the de- 
velopment of a given formula. Applications are sought at a later date. 
Very often it is found that the assumptions are so restrictive that the co- 
efficient can legitimately be used in only a small proportion of certain 
types of application. In other instances the coefficient may have legiti- 
mate application but may not provide the interpretation needed. (12, p. 170) 
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BASIS FOR JUDGING GAIN FROM TESTING 
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Best a priori strategy 


Even when the a priori population is defined as suggested by Jarrett, chance 
selection is not necessarily the alternative strategy with which testing is to 
be compared. The decision maker who does not test may use interviews or 
other available information to provide some basis for a decision. Thus, mili- 
tary recruits could be screened with better -than-chance efficiency by utilizing 
recorded information about civilian job experience and grade of school com- 
pleted, instead of tests. Tests will improve decisions, but not as much as is 
implied by the phrase "improvement over chance". Similarly, if testing were 
impossible in educational placement or selection, one would not fall back upon 
a chance decision, because previous school records could be used with substan- 
tial predictive validity. 

Tests should be judged on the basis of their contribution over an 
best strategy available, making use of prior information. The manner in which 


d above the 


this comparison should be made depends on our contemplated use of that infor- 
mation. At least three possibilities should be considered. 


1. All prior information’ is used and will continue to be used for pre- 


screening of applicants. In this case, validity coefficients should be 


calculated on the pre -screened group. The a priori strategy would 


be chance selection from this group to fill the quota. 


2. Decisions will be based either on the test or on certain other infor- 


mation such as school records. One or the other will be used, not 


both. In this case, the difference in utility between the two strategies, 


taking cost of each into account, would indicate the advantage of 
testing. 

3. Previously available information and testing information will be com- 
bined into a single score or pattern of scores on which decisions will 
be based. In this case, the gain due to testing is the difference be- 

tween the utility obtained by the composite scoring technique and the 


utility 


Contribution of various types of tests 


test in the light of its independent contribution to utility might 


obtained using only the previously available information. 


Evaluating a 


reshape many of our present testing policies. Conrad makes this demand re- 


garding validity information in test manuals: 

.. . we ought to know what is the contribution of this test over and 
beyond what is available from other, easier sources. For example, it 
is very easy to find out the person's chronological age; will our measure 
of aptitude tell us something that chronological age does not already 
tell us? It is also easy, in some cases, in a local community, to obtain 
a person's school record. If that is so, then the question is, Does the 
intelligence test or the measure of aptitude tell us anything that is not 
already told by the previous information? The independent contribution 
is certainly something which should definitely be known, but very. sel- 
dom is it revealed to us in the information which test publishers provide. 


(21, P- 65) 
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For men accepted, a certain benefit accrues to the institution. We shall 
assume that this payoff has a linear regression on test score. A decision to 
reject a man will mean in most instances that he has no further contact with 
the institution. We may therefore regard the outcome of such a decision as 
having a value of zero. We shall also assume that test scores are normally 
distributed, with zero mean and unit standard deviation. This implies that the 
expected payoffs for randomly selected men are normally distributed. 

This assumption of normal distribution of benefits may not be realistic. 
Standards set by management often place a lower limit on production which 
would cause the distribution to be skewed negatively. Voluntary limitation of 
output in a work group skews the curve in the opposite direction. And thorough 
standardization of a task eliminates variability entirely among those who re- 
main on the job. Specifying a particular distribution permits a more complete 
exploration of functional relations, but generalization to other distributions 
is hazardous. Formulas derived under the normal assumption would need to 
be modified for any specified non-normal distribution of payoffs. Our general 
conclusions regarding the relationship of validity to utility depend only on the 
nature of the hypothesized payoff function. Many conclusions regarding par- 
ticular optimal strategies, however, depend in addition upon the normal assump- 


tion. Brogden (10, 11) has examined thoroughly the effect on utility of depar- 


tures from normality. 


Linear relation of utility to validity 


The net gain in utility per man tested from selection for a fixed treatment 
is linearly related to the validity of the test, under the assumptions stated 


above. As Appendix 1 demonstrates, 


AU = O76 Ey") - cy [2] (1.8)* 


C, is the average cost of testing one person, Tye is the correlation of the test 


with the evaluated criterion in the a priori population, and To is the standard 
deviation of this payoff. y' is the cutting score on the test, and {(y!) is the 
ordinate of the normal curve at that point. In this expression, Tortye is the 
slope of the payoff function relating expected payoff to score. It would be 


possible to employ here, as we do later, the concept of an aptitude s interven- 


and payoff, such that Tye" This would make the ^` 


ing between score ta we 
development more closely comparable to that for adaptive treatment, but the 
argument would be less straightforward. 

It is particularly important to note that the slope of the payoff function is 


influenced by several factors, since conventional methods place almost exclusive 


* Where double numbers are provided, the former is the identifying number of 
the equation in the text, and the second locates it in the Appendix. (1.8) indi- 
cates the eighth equation in Apperi’x l. 
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emphasis on the test-criterion correlation. An increase in correlation leads 

to an increase in slope, but slope also depends on the spread of criterion scores 
(which may vary for different treatments) and on the value associated with one 
unit on the criterion scale. The relation of gain in utility to selection ratio and 


validity is shown in Figure 5. On the vertical axis of this and subsequent 
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Effect of o, on utility. A utility model brings in the important parameter 
Te’ For any one treatment To is constant, and indicates both the magnitude 
and practical significance of individual differences in payoff. A decision maker 
selecting men for many different assignments is frequently able to test for 
only a few of them, and must decide where tests can make the greatest contri- 
bution. The importance of an assignment can justify using a test of low validity 
(10, p. 71). Gain in utility depends on the product Taye Therefore a test of 
validity .30 for one decision may be more beneficial than a test having validity 
.60 for some other selection decision, if o, for the former decision is at least 
twice as large. A large Te is an indication that individual differences on the 
criterion in question have large practical importance. Tests for important 
decisions which fall far short of the ideal predictor may be much more worth 
using (and improving) than tests which give excellent guidance in making minor 
decisions. 

Effect of cost on utility. Cost of testing should be taken into account along 
with validity and o, in deciding which test to use for a particular decision. 


The utility canvstbnted by test 1 is greater than that from test 2 whenever 
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Figure 6. Utility as a function of selection ratio and cost 
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enough to justify use of the less valid test, but at some point (see Chapter VII) 


increases in test length can increase cost enough to offset ac 


companying gains 
in validity. 


Effect of selection ratio on utility. With a fixed number of applicants, total 
gain in utility is greatest when 50% of 


these men are to be accepted. That is 
to say, 


tests make the greatest contribution to t 
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the number of persons to be screened to fill these positions, reducing the selec- 


tion ratio correspondingly. In this case the selection ratio of (1.13) can be 


interpreted as indicating the number of people to test relative to the number 


selected. The smaller the cost of testing, the larger the number of applicants 
it is profitable to test to fill the quota. It is always desirable (assuming fixed 


quotas and linear payoff) to test at least twice as many men as will be accepted 
At the low costs most likely to be encountered 


if the test is worth using at all. 


in practice, a large number of individuals should be tested relative to the num- 


ber to be selected. 


SELECTION WITH ADAPTIVE TREATMENT 


It is reasonable to suppose, as Chapter III pointed out. that the decision 


maker will often be allowed to adjust the treatment according to the quality of 
the men he accepts. Such "adaptive treatment" may be expected to yield greater 


benefit than is obtained under fixed treatment, save where by good fortune the 


fixed treatment happens already to be fitted to the accepted men. 


A general payoff surface 


The first step, in undertaking a study of adaptive treatment, is to introduce 


a general payoff surface linking payoff functions for different treatments. 


A postulated aptitude factor. 


under consideration, the contribution of testing can be studied adequately in 


terms of the relation of test score to criterion. The payoff function is the 
To deal with compet- 


When only one treatment and one test are 


regression of the evaluated criterion on the test score. 
ing treatments, it will be convenient to postulate an aptitude factor s. 
The advantage of the intervening variable s is that it permits us to sepa- 


rate aspects of the decision problem chiefly associated with the differences 


between treatments from those chiefly associated with the test. Where several 


tests measure the same underlying variable, and we wish to consider their 


relative effectiveness, the introduction of s permits us to invoke the same 


payoff function with relation to all tests. Each treatment has its own payoff 
function; but if all the treatments under consideration depend on the same 
aptitude, it is much simpler to describe the relation of payoff to this aptitude 
than to have a separate payoff function for each test-treatment pair. 

As explained in Chapter III, test scores may be factored, and among those 
factors the ones which account for variance in payoff under any of the treat- 
ments being considered are referred to as aptitude factors. That is to say, any 

, reduce all residuals to 
zero constitute a set of s dimensions. There will ie in general be a unique 
set of s's for a given body of data, since these factors may be rotated in various 


ways. One would ordinarily rotate in one of the many ways that would reduce 


factors which, when removed from the matrix of Tye 


the number of s dimensions to a minimum. 
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In studying selection and placement, we restrict ourselves to the case 


where only a single aptitude dimension is required to account for all commu- 


nality between test scores and payoffs. If we are dealing with a single test, 


as we do in most subsequent chapters, the 


"true score" on the test may be 
regarded as such an s dimension. 


It is assumed that the test and payoff under 
any treatment have no common variance save that accounte 
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3. All cross-sections where t is a constant are straight lines. 
4. All cross-sections where s is constant are parabolas. This implies 
that there is a best treatment for each value of s, and that the loss 


from assigning a person to treatment t rather than the tg optimal 


for him is equal to a(m, - mM, )?- 
s 


The parameters a, b, and c depend upon the particular s linking treatments 


and tests. 


yoff as a function of aptitude and treatment 


Figure 8. Expected pa 


Utility in adaptive selection 


The formulas for estimating gain in utility are developed in Appendix 3. 


In evaluating the benefits gained from selection with adaptive treatment, and 


contrasting them with fixed-treatment benefits, one must reconsider not only 
the relation of payoff to score, 
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A R i he 
So long as we confine ourselves to tests measuring a single aptitude s, t 


choice of treatment to be given after testing depends on r The more valid 
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in use is that best suited to the average man, i.e., m, = b/2a (cf. (3-1)). Such 
A 


a situation could arise if previous selection policy had been fairly satisfactory, 
but it is thought that greater benefit will result if men of even higher quality are 
obtained. Using the same treatment after testing and a fixed selection ratio, 


the gain in utility is 


au = Èr Hy!) > Cy [8] 


If, however, the institution were to now change the treatment to that optimally 
suited to the average ability of the selected men, the gain in utility is increased 


1 E2yr 
by gee Y]. This additional gain might be found, for example, when intro- 


ducing tests in a school selection program where teaching methods are subse- 


quently adjusted to the higher level of ability of the accepted students. 
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Figure 9. Comparison of gains from fixed and adaptive selection 
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For the sake of illustration (Figure 9) we may assume that the treatment ty 
originally in use is that suited to very superior men whose average aptitude is 
Sa: Treatment to is the best of all treatments for a group of men whose average 
aptitude is zero. Finally, if the men actually selected have the aptitude s (in 


the figure, 0 < s < sa) the payoff function for the best a posteriori treatment 
is that marked t,- In addition to the payoff lines for three treatments, the 


figure shows a curve to which all such functions are tangent (2.6). The tangent 


to the curve at any value of s shows the payoff to be obtained by giving the opti- 
mum treatment to men having that average level of ability. On this diagram, 


the six levels of utility are indicated. 
The gain in utility with fixed-treatment ta is represented by the difference 


between U and U (cost of testing here being disregarded). The final 
ot, Yta 


utility is, in this instance, distinctly less than may be attained with a less de- 


manding treatment. If one were to change from ta to to without selecting men, 


a benefit would accrue which is actually nearly as large as the benefit (au 
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sidered as the gain beyond this best a priori utility, 


ti U 
adaptive ( y 


fixed) reportedly attainable by testing. 


and is represented by AU 
t - Yor): Other diagrams can be examined where the fixed 


treatment is located so that S, < S,0revens 


< 0. The following generali- 
zations apply to all such combinations, 


Even though adaptation of treatment is common in practice discussions of 
p 


the value of testing in selection have overlooked this possibility, clinging to 
, 


fixed-treatment assumptions. The utility from selected men is always greater 
with adaptive treatment than with fixed treatment, save at that one val : here 
P $ ue w 
s = s,- To a certain extent, therefore, these dis 


cussions have underrated the 
On the other hand, the gain due to 
n the gain under fixed treatment, 


utility that can be obtained using the test. 
testing with adaptive treatment is less tha 


because the a priori utility can be much higher under adaptive diti B 
conditions. By 


sme: by altering the treatment to 
these discussions 


including a component which could be ac 


unselected men more profitably, use 


of testing. Which effect is the larger would depend 
surface, the location of Są» and also on the, cost of 


overrate the contribution 


If practical 

S to men of average ability, 
quation [2] where 
fixea" 
tion of adapting it has not been raised, the advantage cl, 


conditions make it unreasonable to adapt treatment. 
then testing does have the value claimed for it in e li 
utility is 
only because the se 
aimed in the linear 

d with contributions 
Perhaps less ©xPensive in the long 


linearly related to validity. But if treatment is 1 


relation is much too great and the test is being Credite: 


which could be obtained in another manner, 


run. 


The amount the test adds to the best a priori utility is descri 


face shown in Figure 10. All cross-sectio bed by the sur- 


ns with selection r; 


x ; atio consta; 
parabolas. Since we are comparing tests involving the same aptitud nt are 
itude 


. With » validit 
is here represented by Tg» not Tye *ys Constant, the cross-section is 
: z : ij on 
asymmetric, having a maximum at a selection ratio Somewhat less th: ec 
an .50, 


The surface plotted is based on these parameters: a ~ 


WB S473. = oe. 
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Figure 10. Relation of gain in utility to validity (tys) and selection ratio 


in adaptive-treatment selection 


Implications for choice of tests 


Persons validating selection tests should consider explicitly what might be 
achieved by adapting treatments. It is necessary to determine empirically the 
shape of the payoff surface wherever adaptation is allowable. Such investiga- 
tions might well begin by comparing payoff functions for only two alternative 
treatments aimed toward the same goals, but ultimately it will be necessary to 
consider cases involving several continuous treatment dimensions. 

The traditional sequence of industrial selection research has been to estab- 
lish a job and associated training procedure, and then to seek predictors which 
will weed out applicants likely to perform poorly. It is appropriate to fix pro- 
duction goals without reference to the available selection tests, but there is no 
reason at all to regard the job organization or training methods used to attain 
those goals as being independent of such predictors. If payoff functions differ 
when different training methods are used, and these functions intersect some- 
where in the range of ability, then it may be possible to modify the training 
with advantage. Determining objectives is the first step in selection research; 
after that, the research should seek to identify the best combination of tests, 


training method, and job organization. 
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Testing can bring about two types of change: an increase in the average 
quality of accepted men; and a further increase in benefits when treatment is 


adapted to fit this new level of quality. When treatment can be modified, it 


is possible that much of this increase in utility can be attained by adapting 


treatment to fit the level of ability of unselected men. The relative advantages 


of selection without adaptation, adaptation without selection, and adaptation 


tion in screening. If one insists on utilizing only the cream of the applicants, 
the training can be less costly or can attain a higher t 


erminal level. If a 
lower grade of applicant is accepted 


» one takes best advantage of his ability 


aining program. The quality of men selected 
depends on many things: on the Social conditions whi 


with a longer and less demanding tr. 


ch make the applicant pool 
larger or smaller, on the Personnel requirements o : 


f other Programs, and on 
the validity of the relevant tests. 


If two aptitudes are relevant to the same job, 


the average on one 
more severe screening on the other, 


adapting treatment to fit men near aptitude would permit 


The optimal strategy would be a complex 
mixture of adaptation and selection, 


Recognizing the possibility of adaptive treatment Provides a new way of 
looking at virtually every problem of test utilization, 
results will emerge whenever this hitherto Suppressed consideration is brought 
into test theory. ý 


It seems likely that new 


THE INTERPRETATION OF VALIDITY COEFFICIENTS IN SELECTION 


The preceding pages deny the possibility of any simple ANAVES fo ii ms 
: Pe e e que , 
"How valuable for selection decisions is a test wi 2 7 


th validity r2 Neither the 


nt of determination r? 


re linear, 
seems appropriate as an index of selection efficiency, Mor 


of Taylo: 
while utility is linearly related to validity in fixed-treatme 


Fa nt Selection, find 
that the linear relation does not apply in selection with ada: eae 


index of forecasting efficiency E nor the coefficie i 
ili is 
i tly related to utility when payoff functions a i 
AERA Neither, therefore, 
over, our utility 


T and Russell. And 


functions bear only a limited resemblance to those 


4 Ptive treatment. We 

may now integrate the various approaches to the interpretation of validi k 
; i alidity coef- 
SES : i d point to some differences . 
ficients in selection, an ae ae 
ationale, 

The linear relation 
The unear TESSA 


M lation between validity and utilit 

Brogden's linear re AA A 
useful- 

itions; 

l. Persons are divided into an accepted and Tejected 
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n 


SANA group. y ; 
group is eliminated from the institution, while pey he rejected 


Sons in tj 
oup are given uniform treatment. he selected 
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2. The proportion of persons to be accepted is specified. 

3. The treatment to be applied to accepted men is fixed a priori. 

4. Payoff from accepted persons has a linear regression on test score. 
The utility of a decision to reject is the same for each individual. 


This implies that decisions are evaluated in terms of institutional 


utilities. 


The Taylor-Russell results 

The relation between validity and utility reported by Taylor and Russell is 
rather similar to the linear relation under many conditions. The chief differ- 
ences are that (a) they report different shapes for the validity-utility relation 
at different selection ratios, and (b) the relation becomes decidedly non-linear 
at high validity or when a high proportion of unselected men are judged success- 
ful. 

The Taylor-Russell approach assumes fixed treatment and is otherwise 
like the Brogden analysis, except that outcomes are evaluated differently. Tay- 
lor and Russell ignore rejected men, in effect assuming their contribution to 
be zero. Accepted men are classified into "successful" and "unsuccessful" 
groups, and all men in each group are regarded as making equal contributions. 
Thus, while Brogden assumes a continuous equal-interval scale for payoff at 
different levels of criterion performance, Taylor and Russell assume a discon- 


tinuous two-valued payoff. The two scales are related in such a manner as this: 


‘ 20 25 30 35 40 45 50 


ec (linear assumption) -1 0 
ee (Taylor -Russell) 0 0 0 0 1 1 
The Taylor-Russell approach resembles that used in acceptance testing of in- 
dustrial products. In that field, inspection plans are described by various tables 
indicating the efficiency with which each plan protects the purchaser against 
defective objects. The Taylor-Russell tables indicate the "average outgoing 
quality (AOQ)" for a personnel inspection plan, and are therefore comparable 
to the AOQ tables for industrial inspection. Each table indicates what propor- 
tion of ''defective individuals" will be passed by a particular selection ratio, 
knowing the frequency of such individuals in the population. 

Where unsuccessful men are discharged or removed from their assignments 
or fail to complete the training, the difference in utility between a successful 
and an unsuccessful man is likely to be great. Training effort is wasted; morale 
of all employees suffers; management time is consumed in making the decision 
to discharge. Such costs may be far more significant than the differences 
among the workers retained. Under some conditions, moreover, differences 
in ability beyond the minimum needed to perform the job do not lead to differ- 
ences in benefit. If production is standardized either by job definition or by 
voluntary restriction of output, then benefits will depend little on ability. In 
these instances, evaluation by the proportion of successful workers among 


those accepted may be fully appropriate. 
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Sometimes differences in output are believed to occur, but are presently 
unmeasurable. Thus teachers who fail as instructors or class managers are 
discharged, but no agreement can be obtained on relative evaluation of the 
teachers retained. Likewise, in evaluating psychiatric screening a person who 
breaks down must be counted as markedly different in value from those who do 
not, but it is difficult to sustain any quantitative comparison of contribution 
among those who remain above the level of overt disturbance. This situation 
too justifies a step-wise evaluation. 

Examination of the number of "hits" is sometimes accompanied by an exa- 
mination of the "misses", or number of persons rejected who were of accep- 
table quality. The best-known discussion of this approach is Berkson's "'cost- 
utility" paper (5). He defines "utility" as the Proportion of unsuccessful men eli- 
minated, and "cost" as the proportion of successful men rejected. In any selec- 
tion scheme, raising the cutting score increases the Berkson "utility" with a 
corresponding increase in "cost". The decision maker seeks a strategy which 
balances these two risks most satisfactorily. A group of sociologists (28) has 
described several distinct formulas for evaluating hits and misses, which need 
not be reviewed here. Our thinking is most consistent with the plan which 
assigns particular values to "hits" and "misses", and adjusts the cutting score 
to maximize expected utility. In institutional selection with fixed quota where 


men rejected leave the institution, it is not meaningful to consider misses, 


however. Rejection of men of good quality does not decrease the output of the 


institution; their frequency bears only on "what might have been". Two-sided 
evaluations are appropriate only when the screening test divides men into two 
groups who remain within the institution but are treated differently. An example 


is the use of a test to predict parole, where the parolee and the man held in 


prison both affect the "balance sheet" of the correctional system. These, how- 


ever, are placement rather than selection decisions. 
Parabolic relations 
asec Terations 


Like others who have investigated the selection problem, we find no basis 


for interpreting the usefulness of tests in terms of the coefficient of forecast- 


ing efficiency or the coefficient of determination. These indices should not be 


employed in evaluating tests for selection purposes. Chapter VI will consider 


the appropriateness of these indices for certain other 


decisions, 
Paradoxically, 


having dismissed two traditional parabolic functions, we find 


that another parabolic relationship is appropriate for evaluating tests in selec- 


tion. Where adaptive treatment is allowed and all tests bein; 


g Compared measure 
the same aptitude, equation 


[7] -- or some similar function based on a payoff 
surface of degree higher than two -- is the most adequate statement of the rela- 


tion of utility to validity. When the selection ratio is moderate to high, this 


SELECTION DECISIONS WITH SINGLE-STAGE TESTING 47 


parabola has little curvature and the linear relation approximates it fairly 

well. At low selection ratios, concave curvature may be appreciable (quite 
contrary to the Taylor-Russell picture of convex curvature at low selection 
ratio). 

By neglecting the possibility of adaptation, some discussions of the linear 
relation between utility and validity tend to give a misleading picture of the 
value of testing per se. If the treatment presently in use is fixed, as it is in 
some industrial situations, small increases in validity of selection procedures 


will result in considerable benefit. If adaptation of the treatment is possible, 


however, and selection ratio is low, much of the gain can be achieved by adjust- 


ing the treatment to the average man and thus raising the utility of the a priori 


strategy. If selection is then employed, the increase in utility which results 


is linearly related to validity, but it is p! 
ther adaptation of treatment to the selected 


ossibly quite modest in amount. Fur- 


group brings additional gains which 


are parabolically related to validity. 

Our analysis of adaptive selection sur 
those who have used the linear relation to "prove to management" that tests 
are beneficial. The "profit" they have 
tial bag of gold that might be earned by 
ing were abolished. Testing is usually bene 
but not as beneficial as has been claimed when only 


ely calls for a reconsideration by 


piled on the scales includes a substan- 
another branch of the business if test- 
ficial under adaptive conditions, 


fixed treatment is considered. 


2" is a complicated question. The 


“What is a test of given validity worth 
the institu- 


total benefit achieved through use of a test is of more concern to 


tional user than is the proportion of possible benefit or improvement obtained. 


This total benefit is increased, generally speaking, with increases in the number 


of persons to be tested, the importance of the criterion perfor 
extent of individual differences in performance. 


mance to the insti- 


tution, and the The benefit de- 


creases with greater cost of testing. Therefore a test of validity 
.60 in another. The 


.20 in one 


situation may be more beneficial than a test of validity 


characteristics of the specific decision determine "what the test is worth". 


TWO-STAGE SEQUENTIAL SELECTION 


The preceding chapter assumed that the selector would make a single ter- 


minal decision. The use of investigatory decisions, mentioned in Chapter II, 


will now be considered. Efficiency of testing is often improved by a sequential 
plan allowing the decision maker to continue testing whenever he is in doubt 
about acceptance or rejection of an individual. 


POSSIBLE USES OF SEQUENTIAL METHODS 


Sequential acceptance testing 


Sequential methods were first developed to meet the requirements of indus- 
trial inspection, but their Scope has been extended to cov 


er all testing of statis- 
tical hypotheses. 


The quotation from Girshick (see page 2) indicates that they 
are now "the rule rather than the exception" in statistical decisions. 


The ori- 
ginal industrial inspection problem is remarkabl: 


y similar to personnel selec- 


tion, Either the manufacturer or the consumer inspects samples of products 


to determine whether a given lot is of acceptable quality. This is precisely 
analogous to deciding whether an applicant is of a 


cceptable quality on the basis 
of a sample of his behavior 


» and one may therefore anticipate gains from trans- 
ferring the inspection methods to personnel work, 


Sequential plans are strategies in which investigatory decisions are allowed. 
In ''double-sampling" or two-stage plans, a terminal decision is required after 


the second testing. Multi-stage plans are also possible, investigatory decisions 


being allowed after every stage; some terminal decisions are madé after each 


test but a steadily decreasing fraction of the cases are carried thr 


ough subse- 
quent tests. 


Sequential plans are beneficial because it costs something to gather infor- 


mation. Obviously, it would always be better to administer the full series of 


tests, whatever its length, to every person if observations cost nothing. As was 


pointed out in the preceding chapter, test theory has generally ignored costs of 
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testing, but even quite small costs affect choice of testing procedure. We may 


anticipate economies from adoption of a sequential plan in personnel selection. 


Considering only cost of testing, maximum benefit would always be attained 


by using a multi-stage plan. Administering a sequential selection plan, however, 


involves costs which would not be found in single-stage testing. A sequential 
plan requires intelligent administration; in industry, the fact that varying lengths 


of time may be required to inspect different objects sometimes creates serious 


scheduling problems. Likewise, in individual processing of recruits or appli- 
cants, the smooth flow of a processing line would be disrupted if time for one 
of the examinations varied substantially from person to person. Where a com- 


plete sequential plan would be awkward to administer, a double-sampling plan 


is frequently a satisfactory compromise. It retains much of the efficiency of 


the multi-stage plan, and yet is relatively easy to administer. 
ent chapter explores the characteristics of two-stage sequential 
This analysis will not per se indicate 


since the cost of administering the 


The pres 
sampling plans for personnel selection. 
just when sequential plans should be used, 
plan is not included in the mathematical specifications. (In this, we follow the 


practice of statisticians.) While our analysis will show benefits from the adop- 


tion of sequential procedures, these benefits must always be weighed against 
the unspecified administrative costs. 


Experience in other fields suggests that sequential procedures attain a 


given level of accuracy of decisions with about half the amount of testing re- 


n. Arbous and Sichel (3) have studied a "pre- 


quired by a single-stage pla 
hich is in effect a modified double 


personnel testing w! 
the optimum sequential plan, they 


A similar pre-screening plan was 


points out that in a great variety 


screening" procedure for 
sampling. Although their procedure is not 
report noteworthy savings in testing time. 
developed independently by Cochran (20). He 


of situations it is advantageous to reduce the 
s for breeding, where some of the 


original group by stages. He 


gives as one example the selecting of hog 
original group can be dropped because of inferior weight. Final selection 
among those who pass the weight test is based on the quality of their first off- 
spring (a test which is expensive to make because it requires long delay). 


High cost of testing, as when costly ra 
test (cf. Evans (33)), makes sequential methods especially beneficial. Partic- 
al testing are also to be anticipated when men 


dar equipment is used in a proficiency 


ular advantages from sequenti 


e cost of testing is then relatively great. 


are tested individually, sinc 


Other sequential decisions 
Our interest in sequential methods is not confined to a literal application 
ting methods of industry- Sequential approaches to many 


of the acceptance-tes 
as Fiske and Jones (34) demon- 


other problems may be. equally important, 
ew of psychological applications. 


strated by their revi 
tional psychometric proce- 


method departs from the tradi 


The sequential 
persons are tested in the same 


dure of standardizing test questions so that all 
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way. Questioning is pursued only until the required decisions can be made; 
testing time can therefore be used more efficiently. Many forces appear to 
be compelling increased concern for efficiency in testing. For one thing, the 
identification of more and more aptitudes relevant to various occupations or 
job assignments means that thorough testing over all significant fields will 
henceforth require more time than is usually available. In proficiency and 
achievement testing also, it is increasingly recognized that a very large number 
of outcomes need to be measured in order to assess the individual and his in- 
struction. A further reason for highly efficient testing is that tests are being 
introduced where only very small amounts of testing time are available. This 
is notable in military research where it is desired to test men on active duty 


in advanced areas. Such pressures already confront psychologists with demands 


to produce a test to measure this or that ability in five minutes of testing time. 


Test designs therefore must be maximally efficient. 


Frequently the tester is required to decide among several treatments for 


each individual. Job classification is an example. In clinical testing also, one 


ordinarily has many hypotheses about the individua 
reject. In these instances, the problem at the first 


l which he may accept or 
stage of testing is to divide 


the alternatives (rather than the people) into three classes: those accepted, 


those rejected, and those to be studied further. What alternatives remain to 


er information before making 
The first test may be one of a group of 
test made up of representative items 


the remainder of the battery being considered as 
the second test. As a matter of convenience 


dissimilar tests or it may be a short 


from the complete battery, 


» We express the scores on the 
S, y] being the standardized score 
ed score on the other test after 


two tests in terms of independent component: 
on the first test and Y2 being the standardiz, 
the first test is partialled out. 


Alternative testing plans 


At least five different selection strategies are available (without consider- 


ing the possibility of altering the lengths of tests Ya and y,): 
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1. Non-sequential battery. Administer the total battery to all men, and 
accept those having highest scores on an optimally weighted compo- 
site. This is the conventional selection method, and we shall here- 
after refer to it as the "Battery" procedure. 

2. Single screen. Administer either test alone, and base all decisions 
on it. 

3. Sequential. After the first test is given, divide the group into three 
portions: those accepted, those rejected, and those to be given the 
second test. Base final decisions for the last-named men on the 
composite of both tests. We hereafter refer to this as the "Sequen- 
tial" strategy even though Pre-reject and Pre-accept are also sequen- 
tial devices. 

4, Pre-reject. After the first test, reject some men and continue to 
test all others. For these others, base final decisions on the compo- 
site of both tests. 

5. Pre-accept. After the first test, accept some men and continue to 


test all others. For these others, base final decisions on the com- 
posite of both tests. 
Figures 11 and 12 will aid in distinguishing these procedures. Figure 11 
shows the joint distribution of y} and the independent second variable y,. 
Three lines are located in this figure. The line at yj is a cutoff below which 


SECOND COMPONENT (y,) 


yi yı 
FIRST SCORE (y,) 


Figure 11. Sequential selection procedure 


The line at y} is a cutoff above which 


persons are rejected after the first test. 
The slant line cuts 


persons are accepted on the basis of the first test alone. 
off persons whose weighted composite score Y = f(y), y) is above a certain 


level. Any person is accepted whose composite score on both tests is above 
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NON-SEQUENTIAL BATTERY 


SEQUENTIAL PRE-REJECT PRE-ACCEPT 


Figure 12. Alternative methods of selection 


the line, and vice versa. The lines divide the distribution into areas I, IL, II, 
and IV. Each strategy represents a different treatment of these areas, as 
shown in Figure 12. 

The Pre-reject strategy is a version of the Arbous 
propose that an inexpensive portion of the battery be u 
eliminate those men who have very low probabili 


-Sichel method. They 
sed as a first screen to 
ty of passing on the battery as 


» Since it neglects the possibility of 
accepting some men on the first Screen and does not allow for adjusting risks 

_ according to the cost of the second test. 

, 


a whole. Their method is not optimum 


Cochran's method also is a pre-reject 
procedure. While he argues for the importance of t: 


selection ratio formally into account, his 
he says (20, p. 455) that there 


aking cost of testing and 
Procedure does not do so. Indeed, 


"does not seem to be a useful general solution 


in functional terms". Our result in Appendix 4, however, is offered as such a 


solution for those situations where a linear payoff function applies. 


Evaluation assumptions 


At least three different methods of Specifying evaluations are available, 
any one of which leads to a somewhat different location of the cutoff points yj 


and yi The first is the use of a continuous scale for evaluating outcomes in 


utility units and specifying a relation between payoff and test information. If 
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the a priori distribution of test scores and payoffs is known, and if fixed 
treatment is used, the strategy which maximizes payoff can be determined for 
a given cost of testing. This method is employed in Appendix 4, using a linear 
relation of payoff to test score and assuming a normal distribution of scores 
and payoffs. While such a system of cardinal utilities is mathematically satis- 
fying, it requires difficult computations. Approximate procedures based on 
Neyman-Pearson risks (see below) are therefore generally used at present to 
study practical problems. We shall use such a method to study multi-stage 
sequential procedures in Chapter VII and Appendix 6. 

Neyman-Pearson risks were introduced into general hypothesis testing as 
a modification of the confidence-level system of Fisher. While Fisherian 
analysis reports the risk of rejecting a null (or other a priori) hypothesis when 
it is true, the investigator wishes also to consider the risk of accepting this 
hypothesis when some alternative hypothesis is true. Neyman and Pearson 
drew attention to these two kinds of "errors", and used them to compare the 
power of various statistical tests. A decision maker is required to indicate 
the errors he wishes to avoid and to state what risk of each error he is willing 


to tolerate. (Unless he can tolerate some degree of error, he cannot use a 


sampling plan to get information.) In a selection plan, the decision maker might 


wish to eliminate men whose true aptitude is below a standard score of +.5, 
It would not ordinarily be a matter of concern if the plan admitted 


for example. 
mitude of error 


some men at .48 or rejected some at .52, but there is some mag 
which would be serious. The decision maker is perhaps willing to admit some 
.35, on the theory that they can be discovered and eliminated, 


missing men of aptitude .55. Thus we have 


men as low as 


and wishes to minimize the risk of 


two tolerance limits, -35 and .55. He now specifies the risk for each error, 


perhaps stating that he is willing to tolerate admitting 10% of the men at apti- 


tude .35 (the risk will be less than .10 for men below .35), and will tolerate a 
ing 2% of the men at .55 (less above that point). In the language 
the a risk is said to be .10, and the 8 risk, 


risk of miss: 
generally used for this system, 
.02. Once these are specified, the selection procedure requiring the minimum 
amount of testing can be readily determined. This strategy is not dependent on 
distribution. To compute the expected net utility it would 


an assumed a priori 
however, since different distributions 


be necessary to make such an assumption, 


result in a different total frequency of each kind of error. 


A selection plan based on Neyman-Pearson risks does not weight costs of. 


testing explicitly. Instead, the decision maker is required to take them into 


account implicitly in setting tolerances and risks. Determination of a sequen- 
tial plan and study of its 
ure (35, 61) by studies performed during World War IL 


performance characteristics has been reduced to a 


simple routine procedi 


by the Statistical Rese 
leadership of Wald. It is generally believed that the methods based on risks 


proximation to the procedures based on a cardinal utility 


arch Group at Columbia University, working under the 


are an adequate ap] 
scale, for those cases to which they apply- 
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The evaluation of strategies in terms of the number of "hits" and "misses", 
discussed above in connection with the work of Taylor-Russell and Berkson, 
might also be applied to sequential problems. Indeed, Arbous and Sichel employ 


this evaluation to describe the efficiency of their plan even though they employ 
risks in developing it. 


Location of cutoffs 


In our treatment of the double-stage sequential procedure, the cutoff scores 
on the first test and on the total battery (the lines Yj: yy) and MN in Figure 11) 
are to be located so as to maximize utility for any selection ratio and strategy, 
using cardinal utilities and costs. The mathematical solution is presented in 
Appendix 4, The same cutoff yj is associated with a particular MN for both 
Sequential and Pre-reject strategies, and the same yj for Sequential and Pre- 
accept. Whichever test is designated Yy} will be given to all men. Persons 
falling in any score array on y) are accepted or rejected at once if screening 
on the second test will not raise their total contribution enough to compensate 
for the cost of testing. 

There is no simple formula to locate yj and yi directly, but the vertical 


coordinates yb and y2 of points M and N depend on only two parameters, Cy 
2 


and ry y These represent the cost of the second screen and its correlation 
2 


with the total battery (remembering that any component correlated with the 


first test has been removed from Y2). The following relation specifies y3: 


yoly) = &(y3) - Cy Me yety,¥ [9] (4.18) 


y3 is equal to y3 and opposite in sign. 
The foregoing equation is very much like equation (1.13), which expresses 


the optimum selection ratio in single-stage selection (Figure 6). An analogous 


for the double- 
stage problem. For the lower boundary the second test should not be used to 
select among men in aayyy 


cost-benefit ratio dictates the selection ratio within the array 


array if the battery cutting score would accept 
fewer persons in the array than the optimum selecti 


on ratio for a test used 
singly which has cost c; 


> and validity equal to the independent validity of the 


second test. For the upper boundary the second test should not be used if the 


battery cutting score rejects fewer persons than the optimum selection ratio 


for maximum gain. per man rejected when the second test is used singly. 
In order to specify the sampling plan, we choose a cutoff value Y = Y', 

where Y represents the best weighted composite of y; and Y+ This cutoff 

leads to a different proportion of persons accepted under each of the 


strategies. 
By the method described in Appendix 4 


’ yj and y] are computed. The utility 


for any strategy is a function of the expected payoff of the men selected (summed 


over the shaded areas) less the cost of testing. The cost of testing isC, , 
y. 


1 
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SEQUENTIAL 
SINGLE-SCREEI 


0 50 1,00 
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Figure 13. Gain in utility with sequential and non-sequential strategies 


plus C_ times the proportion of persons given the second test. From (4.8) one 


2 
can calculate the utility for any strategy and, from (4.9), the selection ratio 


corresponding to Y'. By taking a series of values for Y', we generate a curve 
P g d 8 g 


showing utility as a function of selection ratio. 


Comparison of sequential and non-sequential strategies 


Curves for these and other various sets of parameters were obtained by 
programming” the equations of Appendix 4 for the Illinois Automatic Electronic- 
Computer (ILLIAC). Figures 13, 14, and 15 are based on three sets of para- 
meters, chosen to illustrate several conclusions. An examination of Figure 13 
will acquaint the reader with the type of information to be obtained. Utility per 
man tested is plotted as a function of selection ratio for three testing strategies. 


The figure is based on two parameters, T) (written ty y in Appendix 4) and Cc, 
1 
(elsewhere written Gy, M eTYe ). Fixing Tr) determines the contribution of the 


second test to the TAN since the sum of squares of the independent contri- 
butions must be 1.00. Dividing cost by o tye (and also dividing AU by Te Ye 
before plotting) has no effect on the shape of the functions to be discussed and 
makes it simpler to compare various strategies since it reduces the number 


of parameters to be considered. In a practical problem the validity ry, of the 


total battery is fixed by the choice of tests. 


The cost cy of the first screen does not enter into the comparison of stra- 


tegies per se, since it is charged against utility equally in all strategies. Thus 


ter a first test has been 


for the purpose of comparing various strategies aft 
In general the cheaper 


designated, no generality is lost by setting C} at zero. 
test should be used as the first screen when the zero-order validities of the 


r 
With the assistance of Jack C. Merwin 
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C,=.05 
r= .80 


SEQUENTIAL 


0 50 1,00 


SELECTION RATIO (¢) 


Figure 14. Gain in utility with sequential and non-sequential strategies 


two tests are roughly equal or when the cheaper test is also the more valid. 
But if the more expensive test is the more valid, the choice of first screen 
depends on the selection ratio. In our analysis we tacitly assume that the test 
chosen as first screen is preferable at all selection ratios. In Chapter VII we 
consider the problem of adjusting the length of the first test optimally when it 
is a representative sample of the total battery. 

The dotted line in Figure 13 indicates the utility obtained by administering 
the first test alone (Single-screen). This curve is similar to the cross-sections 
in Figure 5. The broken line indicating utility under the Battery strategy is 
likewise consistent with Chapter IV. In this figure, the cost of the second test 
is so great that Battery is never as profitable as Single-screen. 

The solid line describes the utility yielded by the Sequential procedure at 


each selection ratio. At extreme selection ratios, Sequential comes close to 
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Figure 15. Gain in utility with sequential and non-sequential strategies 
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the curve for Single-screen. At all selection ratios, Sequential is equal or 
superior to either of the other strategies. We shall find that this is a general 
conclusion. 

Figure 14 employs tests with the same relative validity as Figure 13, but 
the cost of the second test is lowered. This raises the utility for Battery at 
all selection ratios so that it is now superior to Single-screen at intermediate 
selection ratios. Sequential also yields more benefit than formerly, but its 
advantage over Battery has declined. Figure 15 retains the same cost, but 
raises the contribution of the second test and correspondingly decreases the 
contribution of the first. Comparing this to Figure 14, we find that the change 
has lowered the value of Single-screen, giving Battery greater superiority. 
But Sequential, which uses the second test only in relatively doubtful decisions, 
is still superior to both the other strategies. 

In the subsequent illustrations pointing out various trends in the computed 
results, the parameters of the decision problem are generally taken to be 
$ = -50, = = -70, r3 = -71 (which values would be equal save for the necessity 
of rounding in computation) and C, = .10. The values AU and C, are defined 
as in Figures 13-15. It should be noted that the vertical scale in Figure 16 
is different from that of the previous figures. 

The advantage of Sequential over Battery and over Single-screen is shown 
in Figure 16. At extreme selection ratios the advantage over the conventional 
Battery procedure equals the cost of the second test. At selection ratios near 
-50, Sequential has relatively little superiority. The figure emphasizes the 
comparison of the Sequential strategy to the better of the two non-sequential 
procedures. At extreme selection ratios it is compared to Single-screen, and 
at intermediate selection ratios, to Battery. As expected, Sequential is consis- 
tently superior by some slight amount, but its most striking advantage occurs 
neither at extreme selection ratios nor where ¢ = .50. Battery and Single- 


screen yield equal utility at two selection ratios where cost and contribution 


-50 1.00 


ADVANTAGE OVER BEST 
NON-SEQUENTIAL STRATEGY 


SELECTION RATIO (¢) 


------ ADVANTAGE OVER SINGLE-SCREEN 
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Figure 16. Comparison of sequential and best non-sequential strategy 
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of the second test are exactly balanced. Sequential always offers its maximum 


advantage at these selection ratios. 


Effect of variation in cost 


The cost of the second test obviously determines which of the strategies 
should be used. When a person is given the second test more information is 
obtained, but if cost of testing is high, the decision maker should seek this 
information only for the most doubtful decisions if at all. The change of utility 


with cost is shown in Figure 17, with Tı and selection ratio held constant. 


0 10 -20 


RELATIVE COST (C/o,r,.) 


Figure 17. Utility as a function of cost of second test 


The following points should be noted. As cost increases (or ToT decreases), 


Battery becomes less and less beneficial. At some point Single- e becomes 
superior to Battery. At low cost, Sequential is slightly better than Battery; at 
high cost, slightly better than Single-screen. The greatest difference occurs, 
as before, where the Single-screen and Battery lines cross. The pictured rela- 
tionships are typical, although the magnitude of differences depends on the 
parameters. It should be realized that if C, becomes very low, the test now 


given second would be the better first screen. 


Effect of variation in ry 


The greater the weight of Yı in the composite the greater the benefit from 
both Single -screen and Sequential (Figure 18). At low values of ri , Battery is 
about as good as Sequential. (Here it would perhaps pay to give the second 
test only.) As r) increases, the advantage of Sequential over Battery increases, 
but its advantage over Single-screen diminishes and finally becomes negligible. 
More extreme selection ratios or higher costs of second test would enlarge 


the region of T) in which Sequential is only slightly better than Single-screen. 
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Figure 18. Utility as a function of relative validity of tests 


PRE-REJECT AND PRE-ACCEPT STRATEGIES 


Under all conditions of testing, the utility from the Pre-reject strategy lies 
between that of Battery and Sequential (Figure 19). When selection ratio is 
high, very few people will be rejected at the first stage and Pre-reject has 
negligible advantage over Battery; when selection ratio is low, Pre-reject is 
nearly as advantageous as Sequential. Pre-accept has a similar relation, 
approaching Sequential at high selection ratios and approaching Battery at low 
selection ratios. The curves for Pre-accept and Pre-reject are mirror images. 

Improvement in utility for these incomplete sequential procedures, over the 


best non-sequential, is plotted in Figure 20 as a function of selection ratio. 
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SELECTION RATIO (¢) 


Figure 19. Utility with Pre-accept and Pre-reject strategies 
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—— ADVANTAGE OVER BATTERY 
~---- ADVANTAGE OVER SINGLE-SCREEN 
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Figure 20. Comparison of Pre-reject to Sequential and best 


non-sequential strategy 


Pre-reject gives only half the advantage of the full Sequential method when 
selection ratio is .50, and gives much less advantage at higher selection ratios. 
It is, however, as good as Sequential at low selection ratios. 


We conclude that under our assumptions the proposal of Arbous and Sichel, 
and the complementary Pre -accept method are not superior to Sequential. 


There are, however, practical situations where a complete sequential strategy 


cannot be considered. For example, if the second "test" consists of admitting 


men to the training course and weeding out those who do poorly, 'pre-acceptance" 


is meaningless since all men accepted have to go through the trai 
breeding experiments are similar, in that secon 
litter) 


ning. Cochran's 


d-screen data (quality of first 
are necessarily obtained for all subjects not rejected. 


AMOUNT OF TESTING SAVED 


The core of the Sequential strategy is the correct determination of the cut- 


ting scores yy} and yy Arbous and Sichel provide charts to indicate optimum 


cutting scores in their method, for various selection ratios and tT) In lieu of 


Appendix 4 outlines a procedure by 
which the tester may readily compute the desired cutoffs, knowing the para- 
meters of his decision problem. 


such charts or tables for our strategy, 
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Typical results are plotted in Figures 21 and 22. In each diagram, the 
cutting scores are shown as a function of T}, assuming a cost of .10 units for 
the second test. Setting Y' = 0 (Figure 21) will yield a selection ratio of .50 
regardless of the value of Ty): When the first test accounts for nearly all the 
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Figure 21. Cutting scores on first test when Y' = 0 


battery variance, the two cutting scores coincide at zero, and no one is given 


the second test. As the contribution of the second screen increases, more 
When r] = +80 and r, = -60, the 


and more persons are given the second test. 
ʻe., to about 35% 


second test is given to persons with y, between ~ 46 and .46, i 
of the applicants. The Sequential strategy saves 65% of the second-screen 
testing ordinarily required. 


One would be interested in the relation of first-scre 
his relation would ee 


en cutoffs to T) for 


selection ratios other than .50, but to determine t! 
given Y' yields a different selec- 


a very large number of computations. Any 
therefore, one 


tain a desired selection ratio, 


tion ratio depending on r}; to obi 
. It is not too difficult to compute 


must choose a value of Y' depending on Ty 
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first-screen cutoffs for a particular value of Y'. This has been done for Y' = 1; 
the results are plotted in Figure 22. When r, = 1.00 a selection ratio of .16 


1 
is obtained, but as ry) decreases the selection ratio very rapidly decreases. 


j- 
fbn 
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r2 


given the second test increases and then 
This is not particularly meaningful, however 
score is determined almost entirely by the second test th 
proaches zero for this value of x. 


decreases. P 
» Since when the battery 


e selection ratio ap- 


Figure 23 shows the proportion of the total group given the second test at 
each selection ratio, for one set of Parameters. Also plotted is the proportion 


who would be given the test if Pre -reject strategy were used. A sequential 


procedure greatly reduces the amount of testing when the selection ratio moves 


The striking advantage of Sequential over Pre-reject 


in terms of testing effort saved confirms our earlier conclusion. 
. 


toward either extreme. 
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Figure 23. Proportion of persons given second test 


PLACEMENT DECISIONS 


In placement, persons are divided among two or more groups who remain 
within the institution and receive different treatments. Division of a group 
between two treatments has often been spoken of as screening or selection 
(e.g-, "selection of students for an accelerated program", "screening of emo- 
tionally disturbed cases for individual examination", "selection of prisoners 
for parole", etc.). In each of these instances, the "selected" group is given 
one treatment while the others remain within the institution (broadly defined) 
and are given some other treatment. Therefore, these are placement decisions. 
Although placement may be considered as a special case of classification, it 
is of importance in itself because most measurement and prediction by means 
of tests can be interpreted in terms of the placement model. 

In fixed-treatment placement with a limited number of categories, the score 
scale is divided into segments, and Persons in each successive segment of the 
scale are assigned to a different treatment. Sectioning of students into classes 
according to learning ability is a prototype placement problem. Illustrations 
are also found in some types of job assignment, for example when a proficiency 
test is used to determine the level of responsibility (pay grade) to which a 
typist will be assigned. Each level has its corresponding duties and pay, and 
is characterized by a different function relating utility to aptitude. These 
placement decisions involve a rather small number of alternative treatments. 
An indefinitely large number of alternatives may be used, however, persons 
being located within smaller and smaller intervals on the scale. In the limit, 
the placement problem thus becomes a problem in measurement or estimation. 


Even though an institutional decision maker thinks of his task as "predic- 


tion"! or "measurement", he may use only a few discrete categories. An example 


is the fitting of teaching methods to the pupil's estimated IQ. One would not 


treat those with IQ 109 differently from those with IQ 110, but one could modify 


treatment with each appreciable difference. Such use of measurements is ordi- 


narily better described as a problem in placement with adaptive treatment than 


as placement with fixed treatment, because past experience has determined 
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what treatment is most suitable at each IQ level and the decision maker is 
free to use that best treatment. 

It is appropriate in this chapter to consider a second type of adaptation, 
namely the altering of quotas on the basis of test information. For instance, 
three rather different levels of training may have been developed, each involv- 
ing its own course outline, test materials, and instructional aids; these pro- 
grams would be difficult to change and so treatments would be fixed. But the 


proportion assigned to each treatment could be altered. Improvement in test 


validity would warrant assignment of more persons to the extreme categories. 


Adjustment of quotas may also be beneficial when treatments are adapted, if 


the maximum number of treatments is fixed. The discussion in Chapter IV 


regarding the optimal selection ratio deals with a rather similar problem. 


RELATION OF UTILITY TO VALIDITY 


Fixed-treatment conditions 


To examine what utility results with fixed treatments and fixed quotas, we 


ons similar to those used with selection. Appendix l presents 


employ assumpti 
There are n different treatments, fixed 


the detailed mathematical argument. 
a priori, and we assume that for each treatment there is a linear function re- 


lating payoff to test score. 


each bounded by Yt and y}'> which are fixed so that the proportion of the distri- 
responding treat- 


lling in each segment equals the quota set for the cor 


The score continuum is divided into n intervals 


bution fa. 


With each treatment is associated a value of ©, Tye,’ the slope of the 
t 


ment t. 
The gain in utility per man tested is 


payoff function. 


[10] (1-19) 


AU (fixed) = = Fe Tye, “tt Say 


Here, \ &, stands for Ey) S E(y}!)i below, we shall also write Ad, for 


Hyp) ~ (y}')- At some points il 
[10] in terms of the differences between validity 


t will be helpful to write Bya for Bee tyes Fe 


It is possible to rewrite 


coefficients for adjacent treatments: 


= f si r BAG: ll 
AU A EVD lee *ye, eii yep- 7 [n1] 


The bracketed term is the covariance of score y with the difference in payoff 


ts; these covariances are weighted by E(y!) to ob- 


under two adjacent treatmen' 
y have a different validity coef- 


tain the total benefit. Clearly, since the test ma 
ficient for each treatment, utility is not a simple function of any single validity 


tions of the test with the differences 


coefficiext. However, the greater the correla 
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in payoff between treatments, the greater is the utility from using the test for 
placement. 


It will simplify some of our discussion to express utility in terms of an 


underlying aptitude s as was done in connection with adaptive selection. Since 


we can without further 
loss of generality introduce an S such that for every treatment r 


Š t 
= ith, Ss 
Since mst N 


we are restricting ourselves toa single test score X: 


5 JIE. 
ys se, 


m 7 Cy [12] 


In placement with fixed treatments and fixed quotas 
to the component Bra of validity. 


Adjusted quotas. Total utility is increased b 


when treatments are fixed in advance. 


» utility is linearly related 


y adjusting the quotas even 
Strategy should ideally be adjusted so 
he treatments for which his expected 
payoff is greatest. The most profitable cutting scores are determined, for 
particular values of Tek and e 


ye, ot? from equation (1.22). The cutting scores 


are related to eva As T 5 approaches zero, the optimal cutting scores depart 
from the mean; an increasing Proportion of the cases are assigned to the par- 
ticular fixed treatment which is best suited to average men. This treatment 


is also the one to which all persons should be assigned for th, 


e optimum a priori 
strategy; we designate this treatment as to 


(This is not necessarily the b 


discussed in adaptive selection.) 


AU =r Z m tôt, + Xe - 
ye at a © Bleep 


Sot) Ad, - cy, [13] (1.23) 


The second term here is less than zero. As in selecti, 


for gain in utility with fixed-treatment placement [10] 


which could be obtained merely by a priori adaptation, 
best for unselected men to everyone. For that reason 


on, the simple formula 
includes a component 
giving the treatment 

» the gain from testing 


treatments and fixed 
utility is, however, greater with 


expressed in [13] is less than the gain with the same 
quotas. The total utility including a priori 
adjusted quotas than with fixed quotas. 
t 
In equation [13] the values of At endl Ae depend on Peg Rt Simatic 
AU with adaptive quotas is not a simple linear function of r 


+ The function 
ys 
d ds on the particular treatments to be used. 
epen: 


Benefit compared to selection. Fixed-treatment selection is 


K i obviously a 
al case of fixed-treatment placement in which there are just two treat- 


s [10] to [2]. 


one group of 


speci P, ; res 
P ts, for one of which m, = 0. Making this substitution reduce 
men ? 


y be combined selection-placement problems Such that 
There ma 
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men is rejected while the others are assigned within the institution on the basis 


of the same test. These too can be treated as special cases of [10]. 
ents are in use, equation [11] becomes identical to [2], 


When just two treatm 
Figure 5, presented 


save thato_r is replaced by ç, i T, pa . 
e i ep = ep1) Yles e1) 


in our discussion of selection (p. 34), therefore also applies to the relation of 


utility to validity in placement with two fixed treatments. Itis also to be noted 


that the discussion of two-stage sequential selection in Chapter V and Appen- 
dix 4 applies to sequential placement into two categories, since the covariance 
of y with differential payoff may be substituted for Tere in all equations. It 


is important to emphasize that this paragraph refers to the gain in utility with 


testing. 


Validation of tests for placement with fixed treatments 


Utility depends on the differential payoff of the treatments as a function of 
aptitude. If all m _, were e ual, the gain in utility as a result of placement 
st a E 
a (See [12]). Placement using a univariate 


would be zero regardless of Fy 


score is profitable only when the slope of the payoff function is different for 


different treatments. This does not necessarily mean that the test-criterion 


correlation T e must differ from treatment to treatment, since the slope M t 
tà 


t 
depends also on f, - 
es 
Although the implications of this fact for plac 
ult established in connection with dif- 


problems is well known (see, for example, 


ual validity for predict- 


ement have not to our know- 


ledge been discussed, a comparable res 


ferential prediction, i.e., classification, 


(13, 71)). A test of general intellectual ability having eq 
ing success in engineering and in liberal arts does not predict in which field a 
A test of mathematical ability which may be a poorer 

ential prediction because it is 
alarts. This relationship has 


th both classification of 


student will do best. 
predictor for either curriculum assists in differ 
more closely related to engineering than to liber: 
been widely discussed in recent years in connection wi 
workers and differential diagnosis of patients. 


Differential payoff, rather than predictive validity as usually 
This implies that tests intended for place- 
Where 


measured, is 


similarly important in placement. 


ment purposes in schools may have been validated in the wrong way. 


two or more treatments depend on the same aptitude, testers have generally 


been satisfied to examine the validity of a test for predicting criterion differences 


within treatments instead of examining how well it predicts payoff differences 


between treatments, i.e., the interaction of outcome and treatment. 


Use of a test in placement rests on the assumption that no one treatment 


yields greatest payoff over the entire range of scores in the population. Number- 


ing the segments of the y continuum in order from 1 ton, and numbering the 


treatments correspondingly, we may rewrite [11] thus: 
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AU = rim, - my, ElI) + (my, - my, ) (yy) + 
st, st) 1 st, st, 2 


+ (gy 3 =) Fyp-pl = c, [4] 


The bracketed term summarizes the differential contribution of a perfect 
measure of s to placement, taking into account the particular treatments and 
the quotas for each treatment. This is multiplied by tos to determine the dif- 
ferential contribution of the test itself. Thus utility in placement depends on 
two things: the power of the test to measure the aptitude dimension s, and the 
power of s to predict differential payoff. 

An aptitude dimension which is strongly related to criterion differences 
within treatments (all m,, being large) may have no relation to differences 
between treatments (if the mM Ţt are equal). Consider a set of treatments for 
which two aptitudes are relevant, aptitude s) being estimated with validity 1.00 
by test y,» and aptitude S2 being estimated with validity .10 by test Yz This 
information alone does not permit us to say that test Yı is superior. Even 


adding the fact that within all treatments s, is a much better predictor than 


S, does not settle the question. If m, , is a constant for all treatments, Yj 
1 


has no value for placement. The importance of an aptitude for placement de- 
pends on the change in mM t over the treatments under consideration. 


Publishers of scholastic aptitude or achievement tests frequently recom- 


mend those tests for sectioning students. The claim of empirical validity is 


supported by giving the correlation of the test with a criterion of success with- 


in some treatment, or occasionally within a pooled group from several treat- 


ments. It has not hitherto been recognized that what the consumer needs to 


know is the slopes of the payoff functions or other data on the interaction of 
test and treatment. 


Tests presently used may be ineffective for placement even though they are 


good predictors within a treatment. Possibly quite different types of items 


would make superior placement tests, because qualities which determine dif- 


ferential response to various treatments are not generally those which best 


predict criterion performance within one treatment. General mental ability, 


for example, is likely to be correlated with success in mathematics no matter 


how the subject is taught. If the alternative teaching procedures are an abstract 


deductive method and an applied inductive method, the brighter students should 


do better with either approach. Payoff functions for both treatments against 


general ability will have positive slope, and may or may not intersect within 


the ability range. On the other hand, there may be other qualities of the indi- 
vidual (say, interest in abstract problems, or liking for rigorous reasoning) 


which would have quite different relations to the two treatments. A measure 


which predicted success under one treatment and not the other would be a much 


better aid to placement than a measure which predicts both. 


PLACEMENT DECISIONS 


Adaptive-treatment conditions 


Where allowable, adapting the treatment to the persons assigned to it is 
expected to have advantages in placement as in selection. A priori, the best 


adaptation is to assign all men to the treatment best for the average man. A 
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posteriori, the best treatment is determined from the general utility surface. 


Appendix 3 demonstrates that gain in utility is a function of the square of the 


validity with which the test measures the s dimension: 


z2 AE, 
pas cd = 
au = 2 aR cy [15] (3-6) 


The parameter a reflects the curvature of the general utility surface for s. 
Figure 24 shows the relation of AU to validity and quota, in an adaptive 

placement problem with two treatments. Here we assume that a = l and 

C. = .05. Restricting the problem to two treatments means that there is a 


single cutting score y' between treatments, and 
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Figure 24. Relation of gain in utility to validity (ry 


in adaptive -treatment placement 


“i and selection ratio _ 
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e g7 (y') 
AU = C E E A -c [16] 
4a [1 - o(y')][e(y')] Y 


Here, ¢(y') is the proportion of men given treatment t,. 

It is of interest to compare [16] with the corresponding equation [7] apply- 
ing to adaptive-treatment selection. Equation [7] contains a term in r 5° The 
reason is that ''adaptive-treatment ‘selection" is actually a mixture of fixed 
and adaptive placement: one group of men are given the treatment best suited 
to their average aptitude while the other group are given a fixed treatment 
(reject) for which mt” 0. As a matter of fact, many mixtures of fixed and 
adaptive placement can be envisioned which are of practical importance. It 
is common practice to use a fixed treatment for the "normal" pupils in a 
school, for example, but to segregate handicapped pupils and give them a 
treatment adapted as well as possible to their capacities. Utility functions 
for such mixed situations will contain both r and r* terms. 

Adjusted quotas. In Appendix 3 it is demonstrated that, with a specified 
number of treatments, cutting scores should be located so that for any pair of 


adjacent treatments 


Ag, Akil 
2yt = EYA + Bea [17] (3.15) 


The cutting score should be located halfway between the means of the groups 


assigned to adjacent treatments. If we plan to divide the group among n treat- 
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Figure 25. Optimum cutting scores for adaptive placement 
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ments, [17] provides n-l simultaneous equations. Since Yt is included in the 
argument of A fe and A $y the equations are not readily solved. Itis of partic- 
ular interest to note that in adaptive treatment the optimum cutting scores do 
not depend on r a When there are just two treatments, the optimum procedure 
is to divide the group at the mean. With three treatments, the optimum cutting 
scores are +.65 and -.65. This allocates 26% of the cases to the two extreme 
groups, and 48% to the middle group. The optimum cutting scores can be deter- 
mined for any value of n by successive approximations. These are indicated 
in Figure 25 for various numbers of treatments; the proportion of cases assigned 
to each treatment can be read from the percentile scale. 

Desirable difference between treatments. When choosing treatments from 


a large number of possibilities, one can use similar or markedly different 


treatments for persons having a specified difference in aptitude. Differences 


of the form mst, 7 Mot will be smaller in the former case than in the latter. 


1 
For a fixed number of treatments, [17] defines the range of test scores within 


each treatment group. The expected average aptitude in each group can then 
be determined. The optimum value of m,, for the group is then determined 
from (2.5). This implies that there is an optimum separation of treatments 


for any given problem. 


The greater r a the more may adjacent treatments differ, With three 


adaptive treatments, 26% of the men (average test score = 1.20) should be 
assigned to the "upper" group. The treatment used should be that suited to 
men whose average aptitude is 1.20r s With low validity, all persons are 
assigned to treatments very similar to t, When y is a good measure of s, 


treatments may differ to a much greater degree. 
When raters, assessors, or other judges estimate individual differences, 


they tend to overdifferentiate. For example, counselors tend to predict that 
more students will earn grades of B or better than actually do so. They ordi- 
narily make insufficient allowance for error and therefore recommend differ- 
entiating treatment more than their predictors warrant, In the language of 
Chapter II, the assessor does not use the optimum strategy, which (for adap- 
tive treatment) is indicated by [17]. This inefficient strategy is one factor 
contributing to the frequent finding that psychologists applying clinical judg- 
ment to objective data make poorer decisions than a statistical formula applied 
to the same data. Both empirical and mathematical aspects of such judgments 


have been discussed elsewhere (26). 


Effect of increasing fineness of discrimination 


We next inquire how utility varies when a test is used for a larger number 
of discriminations. When the number of different treatments n is increased, 
the decision maker faces a more difficult decision. It has generally been be~ 
lieved that a higher standard of validity is required for a test to be useful in 


making fine discriminations. 
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Fixed treatments, fixed quotas. An analysis of the change in validity for 
fixed-treatment placement can be made only under fairly specific assumptions. 


Suppose, for example, that our test is a measure of aptitude s and that we 
employ those n treatments from the general surface which would be optimal 
if Tya were 1.00. Then, using (2.5) and substituting in [12], 


T ag, 
au = 8 5 ag c [18] 
2a t | 4% “E y 


Now if we also assume that the same proportion of persons is assigned to each 
treatment, ġ, = 1/n and 


au = ann att -c [19] 


If some other set of fixed treatments were used or some other quotas, the 


function would have a somewhat more complicated form. With uniform quotas, 


[=] 
8 
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Figure 26. Increase in benefit from placement with increased number 
of treatments 


the product n= A? £ increases with n as shown in Figure 26, In the limit, as 


n>, nË AŽ approaches 1.00. 


Adaptive treatment. When equal numbers of persons are assigned to all 
treatments, and treatments are adapted to fit the group, [15] reduces to 


= ag t y [20] 


e 26 also indicates the relation of n to benefit from adaptive placement 
Figur 


with fixed equal quotas. 
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When both treatments and quotas are adapted, AU is related ton by a 


curve almost identical to Figure 26. With very fine division of the scale, i.e., 


as we approach continuous measures, 


[21] 


Under the conditions studied, the gain in utility as a result of increasing 
the number of treatments is small. After the first few treatments, benefit 
increases only very slightly with increased fineness of discrimination. With 
only two treatments, n Z.A? £ equals .64, with four treatments, .86. Since the 
limit is 1.00 as n increases, the further increase is necessarily very gradual. 


Everyone agrees that accuracy of measurement is advantageous, but three 


rather different types of advantage have been discussed in the literature. Keep- 
ing number of treatments fixed, we have already established that as measure- 
ment improves the resulting assignments achieve more nearly the maximum 
payoff from each individual. Secondly, we have seen that better measurement 
permits greater separation of treatments, with accompanying gains in utility. 

A third benefit sometimes expected is that better measurement warrants divid- 


ing persons among more categories; it remains to determine whether this is 


the case. 

It has generally been believed that there is a strong relation between the 
accuracy of a test and the fineness of discrimination for which it may be used. 
The error of estimate based on Tys specifies limits within which s will be ex- 
pected to fall, and a more valid test estimates performance within any treat- 
For this reason it has been suggested that greater 


ment more accurately. 


accuracy permits one to place persons among a greater number of treatment 


categories. Bloom (7), for one, has interpreted test reliability in terms of 
placement decisions, requiring that frequency of erroneous placement shall 
not exceed 0.1%. By this reasoning, a reliability of .56 justifies use of three 
categories, .84 justifies five, and .96 justifies ten. The same approach could 
(which equals the square root of the reliability if s is equiva- 


be applied to ty 
The required coefficients would then be 


lent to the Niese, score" on the test). 
-75, .92, and .98 respectively, for three, five, and ten categories. 

Our results do not support the conclusion that greater reliability or validity 
justifies division into more groups. Under the conditions leading to equations 
[19] and [20], finer differentiation results in the same proportional increase 
For example, the gain in utility 


in benefit no matter what the value of r, a 
that with three 


with four treatments, disregarding C, is always 7.5% above 
treatments. Our result differs from Bloom's because he evalu 
itude of errors. 


by the number of erroneous placements rather than by the magnit 


ated decisions 


We have counted gross errors of placement which lead to large changes in 
outcome as more serious than fine ones, whereas he has counted all errors as 


equal. Bloom's method of evaluating outcomes does not seem realistic for the 
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educational placement problems he discusses. The allowable number of sub- 


divisions, we conclude, has nothing to do with the accuracy of the test. 

The value of increasing the validity of the test as a measure of the s con- 
tinuum is that it permits greater separation of treatments. With a good test 
people may be divided into ten groups and given ten sharply distinct treatments, 
i.e., treatments suited to quite different levels of aptitude. With a poor test 
people may still be divided into ten groups, but the ten treat 
be extremely different. 


ments should not 


The conventional view is close to ours only when fixed 


treatments and adaptive quotas are involved. Suppose that there are six treat- 


ments, for example, those appropriate respectively where s < 72, -2<s <-l, 
LS ek 0,0< 5 <1, les <2, 2< 5, 


If quotas may be adapted, an in- 
crease in r 


s Warrants placing more persons in the extreme categories. For 


ys below .40, the optimal strategy is to assign about 99.5% of the cases to the 
middle two categories, because that pro 


T 


Portion of persons will have an estimated 
Sj between +1 and -1. Practically Speaking, 


only two categories are used. AS 
validity increases 


» more persons should be assigned to the second and fifth 


categories. When Tys = -60 about 5% of the cases will go into each of these 


groups. Ultimately, with r s = 1.00, about 2% of the cases will go into the 


first category, 2% into the sixth, and about 15% into the second and the fifth. 


preciable frequency does increase with rr, 
3 Sha ys 
Under adaptive conditions 


ments into one. 


THE INTERPRETATION OF VALIDITY COEFFICIENTS 
IN PLACEMENT AND MEASUREMENT 


When Chapter IV examined the relation of test validity to utility of selection 
decisions, the evaluation of the validity coefficient in other uses of tests was‘ 


held in abeyance. We may now summarize relations encountered in both selec- 


tion and placement. 


In placement, a test is used to locate the Person among several treatments, 


f the test. The test is 
each treatment. We 


tmine payoff when pre- 
ys? and Ts 


and no one coefficient can be identified as the validity o 
likely to have a different correlation with criterion for 
have therefore distinguished three elements which dete 


a ingle test, namel: 
diction is made from a single , ¥ Fe 9X, e,' The variable s 
is common between a particular test*and all treatments, So long as all tests 
der consideration are linked to the treatments through the Same s 
un bad we 


i sas » we can 
develop a general function relating utility to the validity of the tests as a measure 


of s, i.e., in terms of Tya 
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In this case, gain in utility in placement with fixed treatment and fixed 


quotas is a simple linear function of Tys" This conclusion depends on the assump- 


tion that there is a linear relation of expected payoff to test score. Assuming 
in addition a second-degree surface relating e, s, and t, the benefit in adaptive 
placement with fixed or adjusted quotas is a function of Ts The results for 


fixed quotas apply to placement into any number of categories. Even if n be- 


comes infinite, the linear or r? function applies, according to whether treat- 


ments are fixed or adaptive. 
Typical decisions involving measurement are best interpreted as adaptive- 
The person who "measures" in order to apply the treat- 


placement problems. 
tained score is engaged in adaptive placement with an 


ment optimal for the ob 


(hypothetically) infinite number of alternative treatments to be considered. 
Bi appears to be the proper index 


Therefore the coefficient of determination ty, 
for evaluating the usefulness of a test in personnel decisions calling for numeri- 
cal measurement. 


The coefficient of forecastin 
est merit. But in the light of these results we must 


proper place in test theory. If it is a proper index 


t be some decision problem and associated payoff 


g efficiency has a long history, and has been 


widely used as an index of t 
now ask whether it has any 
of test efficiency, there mus 
function such that the coeffici 
coefficient of forecasting efficiency 
estimate, relative to the error of a chance estimate. 
appropriate only in a problem where payoff declines linearly with error. One 


ent is proportional to benefit from testing. The 
evaluates a test by the absolute error of 
This evaluation would be 


type of payoff surface (and to the best of our knowledge only one) can be des- 


cribed for which this is true, namely a ridge-shaped surface, formed by the 


intersection of two planes. In such a surface, the payoff function for any single 


treatment is like that in Figure 27, where the treatment shown is that optimal 


PAYOFF (e,,) 


s 


APTITUDE (s) 


Figure 27. Possible payoff function for a single treatment 


for aptitude s. The coefficient of forecasting efficiency can be justified in a 
decision problem where the payoff function takes this shape, and the q 


therefore is whether such a function would occur in practice. 


uestion 
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Outside the personnel field, measurement problems can be imagined where 
this function might apply. Suppose nails are sorted by size into bins, by a pro- 
cess involving some error. Then the carpenter who reaches for a certain size 
of nail for a given job will perhaps get one of incorrect size. The more the 
nail deviates in either direction, the worse suited it is: too small a nail will 
not hold the work firmly, too large a nail is ugly and risks splitting the wood. 
Perhaps the payoff function is neither linear nor sharply discontinuous, but 
over a range of sizes Figure 27 might be a reasonable approximation. Simi- 
lar functions might be involved in the payoff from, say, various sizes of clothing 
allotted to a person, or various concentrations of electrolyte in a plating bath. 
Therefore we can argue that absolute error of estimate is a suitable loss func- 
tion for some measurement problems. 

But does it apply in personnel testing? At this point, we would say that it 
does not. Overestimating a score leads to assigning a person to a treatment 
he may be unable to profit from. But underestimating aptitude seems unlikely 
to have consequences like those of underestimating the size of a nail. The man 
who is placed in a treatment "too simple for him" will ordinarily outperform 
the man of lesser ability. The loss lies in the failure to assign the man toa 
treatment where he could do even better. In some situations boredom might 
cause inferior performance on the Part of “excessively able" men, but we have 
not identified a situation where the payoff function seems likely to be sharply 
ridged; instead, the rising -then-declining function is likely to be a smooth, 
gradual curve. We therefore conclude that the coefficient of forecasting effi- 
ciency is not a suitable index for describing the value of a test in personnel 
decisions. We may have overlooked Some type of decision which has a payoff 
function justifying this index, but it is surely not appropriate for typical selec- 
tion and placement decisions. The person who proposes to use 1 - NVI - r? 
as an index must bear the burden of proof, showing that this index does cor- 
respond linearly to gain in utility for some particular problem. 

Recent writings have distinguished the Value of a test for screening or selec- 
tion from its value in other uses. The Brogden linear relation or the Taylor- 
Russell function were connected specifically to coarse screening, and have 
been regarded as inapplicable to precise decisions or prediction for an individ- 
ual (see, for example, (70)). The coefficient of forecasting efficiency or the 
standard error of estimate is ordinarily invoked for these latter cases. The 
coefficient of determination r? is not often mentioned directly as an index of 
efficiency, although it is Prominent in test theory. 


Our results, taken all together, lead to the following quite different views: 


l. There is no single "validity coefficient" on which the contribution of 
a test depends, save in the case of fixed-treatment selection. Ina 
placement problem, a measurement problem, or an adaptive-selec- 
tion problem, we can only compare the contributions of various tests 
measuring the same aptitude continuum. Statements can then be made 


about the value of the test as a function of Zye 
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2. In selection with treatment and quota fixed a priori, utility is a linear 
function of the correlation between test score and evaluated outcome, 
i.e., of the validity coefficient. In fixed-treatment, fixed-quota place- 
ment, utility is a linear function of the several validities correspond- 
ing to the various treatments. It is also a linear function of the vali- 
dities of the test for predicting the differences in payoff between pairs 
of treatments. Finally, utility may be interpreted as a linear function 
of r__ in fixed-treatment selection and placement. 

3. When we may regard all successful men as making equal contribution 
to the institution, the Taylor-Russell tables are more appropriate for 
evaluating selective efficiency in fixed treatment than the linear func- 
tion. Otherwise, the Taylor-Russell results are best regarded as a 
rough approximation to the linear relation. 

4. In placement or measurement with adaptive treatment, utility is a 
function of re? which has the form of a coefficient of determination. 
To compare tests measuring different aptitudes, the parameter a of 
the payoff surface for each aptitude must also be taken into account. 

5. Selection with adaptive treatment combines the features of fixed and 


adaptive placement. The relation of utility to validity involves both 
Tys and Ys terms. 
6. Where treatments are fixed, but quotas may be adapted according to 


the value of r, a the relation of utility to validity becomes too com- 


plex to be described in a simple index. 
7. No case of practical importance has been found where the coefficient 


of forecasting efficiency is a suitable index of test efficiency. 


7 


EFFICIENT TESTING PROCEDURES 


Among the chief problems of the decision maker is the efficient design of 
testing procedures. The time he can devote to testing is often severely limited, 


and how to use that time efficiently is a matter of concern. 


ADJUSTMENT OF TEST LENGTH 


One way in which the efficiency of testing can be increased is by adjusting 
a test or battery to that length which would yield maximum gain in utility for 
the decision problem in which it will be used. When a particular collection of 
items is used, gains in efficiency can be obtained by altering the number of 
items. With a battery of tests treated as a composite it may be desirable not 
only to alter the lengths of the separate tests, but also to omit some tests en- 
tirely despite the fact that they improve the multiple correlation. 


Optimum length for a single test 


The validity of a test rises as it is lengthened, the increase being gradual 
unless the units forming the test have low intercorrelations. Since validity 
does increase indefinitely, there might seem to be no limit to the desirable 
length of test. A utility analysis, however, demonstrates that there is an opti- 
mum length, beyond which increases in cost outweigh benefits from greater 
validity. This point was recognized by Hull. 


Speaking of a battery composed by adding units with validity .40 and inter- 
correlation .20, he wrote: 


The tenth [unit] adds only 1.2 points to the correlation yield. The 
question inevitably arises whether an increase of a single point or so 
in the correlation yield is worth the extra time and labor involved in 
giving and scoring an entire additional test unit. In any case it must 
be perfectly obvious that because of this law of diminishing returns a 
place must be reached sooner or later where the addition of a new test 
will not contribute enough to the Prognostic value of the battery to jus- 
tify the incidental expense involved. (43, p. 262) 
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Despite this early statement, the validity-cost balance has generally been 
ignored in discussions of test design, and has never been reduced to a definite 
function. 

To study the effect of lengthening a test we express the general equation 


for placement or selection as follows: 


AU, = B = 
y Be ~ Sy [22] (5.3) 


Here the subscript k indicates that we are considering a test of length k. Bg 
is the benefit from using test y in a particular decision problem d, and has is 
been defined for each type of decision so far considered. 

The cost of a unit test may be divided into two portions, Cy and C}. The 
first element Cy is an initial cost of testing which may be assumed to be con- 


stant, regardless of length. This "setup" cost takes into account assembling 
The marginal cost cy of a unit test may be 


Subjects, giving directions, etc. 
aminer time, time of men tested, 


assumed proportional to length, and includes ex: 
Scoring costs, etc. 


When k similar units are comb: 
ated function of k. Cost increases linearly 


ined to form a test, benefit from testing is 


an increasing negatively acceler 
and must eventually exceed benefit. The difference between benefit and cost 
is the net gain in utility. 
d for a particular unit test and a particular 


The benefit must be specifie 
somewhat different equations ((5.4) 


decision problem. As Appendix 5 shows, 
and (5,13)) describe the increase in benefit with length in fixed-treatment deci- 
Figure 28 shows the change in 


sions and in adaptive-placement decisions. 
utility with length for both types of decisions. While these curves are based 

on a particular set of parameters (By a =l, Ei +30, Co = -05,C, = -02), 
the general shapes of the curves would be similar for other parameters. Change 


20 


10 


LENGTH OF TEST (k) 


Figure 28. Utility as a function of test length 
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of parameters would shift the maxima and alter the curvature. No direct com- 
parison of the curves for fixed and adaptive treatment is warranted, and no 
meaning is attached to the fact that the adaptive curve is higher in the figure, 


since the parameter By i is defined differently in the two cases. 
1 
For any test, there is some one best length (unless the test is too invalid 


to ever repay its cost). If the test is shorter than this optimum, the tester is 

not attaining full utility. As a test is lengthened beyond the optimum value, 

AU declines and eventually becomes negative. The utility curve is fairly flat 

over a large range of k, and it is therefore not critical to determine precisely 
the best k for a given situation. The left portion of Figure 29, based on (5.6), 


FIXED TREATMENT ADAPTIVE TREATMENT 
= = 
3 5 
0 .30 60 0 .30 .60 
BENEFIT-COST RATIO B/C) BENEFIT-COST RATIO (B,/C,) 


Figure 29. Optimum length of test as a function of other parameters 


shows the optimum k for fixed treatment as a function of other parameters. 


The right portion (from (5.16)) shows the optimum for adaptive placement. In 


both cases, the optimum length of test increases as cost C, decreases, benefit 
By ia increases, and/or intercorrelation r y decreases. In selection, shorter 


tests should be used as the selection ratio EA from .50. The optimum 
length does not depend on Co’ the "set-up cost". An optimum length of test 
for adaptive selection also exists (5.19). Adequate cost estimates are not avail- 
able to indicate the range of B/C in typical practical situations. If the para- 
meters employed in Figure 29 are realistic, many present tests are too long 
for greatest efficiency. 


Optimum battery length 


A similar effect occurs when tests are combined into a composite predictor, 
as is usual with test batteries. The multiple correlation rises slowly as the 
battery is augmented. In fixed treatment, the gain from adding the vth test is 


AR - C, , where AR is the resulting increase in the multiple correlation 
v 
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and C_ is the cost of the vth test. A test which, used as a sole predictor, 
contributes sufficient validity to be worth its cost may not add enough in a 
battery to be profitable (cf. (5.10) and (5.11)). 

Previous writers have suggested ways of determining optimum composition 
of a battery, when its length is fixed. Long and Burr (49) consider the case 
where one is dealing with tests whose lengths are fixed but not uniform. They 
show that the conventional procedure for choosing tests to form the battery 
should then be modified. They indicate how to select the best combination of 
tests for any specified testing time. Horst (39) goes further, altering the 
lengths of tests in order to maximize the multiple correlation for any fixed 


total testing time. His argument is parallel to ours save that he does not 


introduce a set-up cost for each new test. Taylor (63) provides a clear dis- 


cussion of the Horst solution, with examples. 

While these solutions consider the design of a battery with a specified 
length, they do not take the further step of determining the optimum length. 
As we have indicated above, the cost of additional testing at some point out- 
weighs benefit from improved prediction. Using (5.10) and (5.11) in an itera- 
tive procedure similar to that of Horst, it is possible to build up a battery 
as closely as desired both in length and composition 


approaching the optimal 


for a fixed-treatment selection problem. Probably one could devise a more 


direct method, taking into account the type of decision problem, the cost of 


each test, its benefit at unit length, and all test intercorrelations. 

The problem of maximizing the efficiency of a battery of tests can be fur- 
ther generalized to situations where many decisions are to be made from the 
obtained information. Such cases have been treated by Horst and others and 


will be discussed in the following chapter. 


Distribution of effort in two-stage sequential testing 


wo-stage sequential plan should also be designed for 


The tests used ina t 
e lengths of the first and second tests being adjusted 


maximum efficiency, th 
We deal with distribution of effort 


according to their unit validities and costs. 
ed total length and cost; the optimal length of a two-stage 


within a battery of fix 
No comparable problems arise in the usual 


battery has not been studied. 
multi-stage plan, where the tests used at successive stages are identical in 
validity and cost. 

The ideal distribution of effort can be examined b 
ons are practical where the battery 


y computations similar 


to those in Chapter V- These computati: 
may be divided into units of equal validity and cost, but not for heterogeneous 


combinations. A given homogeneous battery may be divided in any proportion 
between the two stages of testing. For purposes of illustrating the effect of 
various divisions on validity, we have divided the battery into 20 identical units. 


Any number k of these units may b 
units is set at .10, cy at .005, and Co at 0. 


e employed as the first screen. In this 


example, the intercorrelation of 
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Figure 30. Utility in two-stage selection with varying distribution of effort 


(As in Chapter V, utility and costs are relative to Tye") Figure 30 shows 
how the gain in utility from Single-screen, Battery, and Sequential procedures 
is altered by changes in k, with selection ratios .50 and -10. The utility from 
Battery is constant, since the cost of the battery and the validity do not depend 
on the way it is divided. Utility from Single-screen increases with length in 
the manner of Figure 28. With the parameters here assumed, the first screen 
would profitably be lengthened beyond 20 units when the selection ratio is .50. 
The optimum length for the first stage of a two-stage battery, however, is 
about 10 units -- much less than the optimum length for the same test used as 
a single screen. For a selection ratio of .10 (or .90), the optimum single 
screen is one with k approximately 9, and the optimum length of first stage is 
about 8. If the first stage of testing is much shorter than this optimum, Battery 
is nearly as good as Sequential. If the first stage of Sequential procedure is 
much longer than the optimum, Sequential offers little advantage over the 
Single-screen. The Proportion of the battery assigned to the first stage should 
decrease, other things being equal, when the selection ratio becomes more 
extreme, the intercorrelation of unit tests becomes higher, and cost per unit 


test becomes higher, or the product TT ye becomes less. 


MULTIPLE-STAGE TESTING 


A type of sequential testing more general than we have hitherto considered 


employs a large number of stages for individuals whose assignment is difficult 
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to determine, but for other individuals makes terminal decisions after one or 
stage sequential testing process distributes effort 
test is adjusted to the individual on the basis of 


a few stages. Such a multi- 
very efficiently. The length of 
the information about him as it is received. 

Wald and the Columbia Statistical Research Group (referred to below as 


SRG) have developed multi-stage sequential strategies for a great variety of 


problems, including some of those which concern us. To date, multi-stage 


procedures have been given detailed attention only under the condition that the 


same test is used at every stage, i.e., when the decision maker continues at 


every stage to collect further samples of the same sort of information. Wald 


(69, p. 114 ff.) has developed a "recursion" formula which specifies the ideal 
sequential procedure and permits determination of the benefit from such a 


procedure, provided one knows the a priori distribution of aptitudes and the 


This method is laborious to treat compu 
proximation. The SRG 


cost of testing. tationally, and the 


older SRG methods may be regarded as an adequate ap. 


methods are designed for situations where one lacks prior knowledge of the 


are specified in terms of 
risks of various sorts of errors rather than in terms of utility units. We have 
been able, however, to adapt the methods to certain of our fixed-treatment 


For those readers interested in the pr 


distribution of true scores, and where outcomes 


ocedures, we summarize 


problems. 
al multi-stage stra- 


the SRG methods in Appendix 6 and indicate how the optim 


tegy can be determined for any specified parameters. The present discussion 


confines itself to presenting typical results. 
o be divided into two groups, each to receive 
is established. Then, on the basis of 


Where persons are t: a fixed 
treatment, a desired division point Tis 
able after any stage of testing, 
When this probability approaches 
ding the indi- 


the information y avail: one can estimate the 
probability that an individual is above Ve 
onfidently make a terminal decision regar 
ge of testing 


one or zero, one can c 
ategy provides two scores for each sta; 


vidual. The optimal str: 
ries of the region in which testing continues. As 


which constitute the bounda 


soon as an individual crosses either of these boundaries, testing for him 


e higher of these scores is assigned to one 


ceases. A person attaining th 
g at or below the lower score is assigned to 


treatment, and a person fallin 
the other treatment. 

The utility from a multi-sta 
the intercorrelation of unit tests, 


the gain in utility from a multi-stage procedure, 
testing required were computed, using an ILLIAC program for multi-stage 


testing devised for us by Kern Di The program applies to decisions 


Here we discu her 


ge test depends on the cost of the unit test, 


and the relevant payoff functions. To study 


benefits and amounts of 


ckman. 
è n 4 tl 
with two categories. ss the results in terms of selection ra 
than placement. 

efined the unit test as one for which 


As a matter of convenience, We have d 


= .10. In practice, one might employ larger or smaller units, with 


Tr 
Wy 
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Figure 31. Utility in multi-stage selection as a function of selection ratio 


correspondingly larger or smaller intercorrelations, but this change would 
affect results very little. Cost and utility are expressed relative to Fey oe” 
At each true score total cost is proportional to the number of stages of test- 
ing, and net utility is determined by subtracting cost from benefit. For any 
cost of unit test and selection ratio, there is an optimum strategy. When per- 
sons are selected by this strategy, the utility varies with selection ratio in 
the manner shown in Figure 31. These curves are rather similar to those 
found for two-stage sequential testing. The relation of utility to cost of the 
unit test is depicted in Figure 32. 

The advantage of sequential testing over non-sequential testing is demon- 
strated in Figure 33. Here, one curve shows the benefit from sequential test- 


ing assuming Cc) = .0l. The other curve indicates the utility to be expected 


COST (C) 


Figure 32. Utility in multi-stage selection as a function of cost 
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Figure 33. Utility from comparable non-sequential and multi-stage procedures 


en a test of uniform length, the length of test used at any 


if every man were giv 
e sequen- 


selection ratio being exactly equal to the average length of test for thi 
-sequential curve are only approximate. 


tial procedure. Calculations for the non 
ting is distributed un- 


The sequential plan where the same total amount of tes 


equally over the men tested is superior at any selection ratio to the conven- 


tional procedure. More than this, because the lengths of the non-sequential 
tests are very near the optimum in each case, it is not possible by any adjust- 
non-sequential plan to the point 


ment of length to increase me utility of the 
That is to say, with this test the 


where it equals the sequential procedure. 


utility reached ‘by sequential selection is unattainable by a non-sequential method. 


Sequential testing for placement purposes 


obtaining information can be applied to place 
Sobel and Wald 


Sequential methods of ment 
e two-treatment case discussed above. 
re for deciding which of three intervals along a con- 
They employ assumptions such as those dis- 
placement deci- 


decisions other than th 
(59) describe a’ procedu 
tinuum a parameter falls into. 
6. This method can be applied directly to 
sions in personnel work. While more general solutions involving greater 
numbers of categories and other assumptions can probably be developed, 


none has yet been described ina practically usable form. The principles of 
ed by the Sobel-Wald treatment 


cussed in Appendix 


sequential placement are adequately illustrat: 
of the three-category problem. 

No study has been made of the contribution of sequential placement for more 
ff under each treatment is a linear function 


than three treatments where payo’ 
he total 


s that testing will continue only if t! 


of score. The strategy specifie 
A terminal deci- 


g falls between certain values. 


score after any stage of testin 
f error is reduced to tol- 


sion is made for each person as soon as the risk o 


st extensive for a person whose tru’ 
ime can be saved 


erable size. Testing is mo: e score falls 


near one of the boundaries. 
in testing persons far from any borderline, but the sav: 


It is evident that considerable t: 
ing will be small for 
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persons near the borderline. As the number of treatments increases, there 
are more borderlines and more persons must be given long tests. Sequential 
testing for*placement into many categories, therefore, appears likely to have 
limited value as the number of categories becomes large. Indeed as n—»oo, 
i.e., in a measurement problem, it has been established that the sequential 
method has no advantage (62, 72). 

Designing tests according to a sequential pattern can perhaps improve 
their efficiency for other than selection or placement problems. While most 
test theory assumes the test to be identical for everyone, some individual 
tests have involved decisions at one stage as to what questions will be admin- 
istered next. Thus the Stanford-Binet Scale employs the vocabulary score to 
determine a trial basal age. In measurement of sensory thresholds, the ascend- 
ing and descending trials of the Method of Limits are similarly sequential in 
nature. Recently developed answer sheets which permit immediate scoring of 
responses open up the possibility of sequential procedure in group tests. 

One situation where such a procedure promises to be valuable is in analytic 
proficiency or achievement measurement. Here, many different types of attain- 
ment need to be checked so the subject can be given further training where 
necessary. If a minimum performance standard can be established for each 
type of attainment, a short test can be administered for each. When these tests 
are scored, decisions can be made at an acceptable level of confidence regard- 
ing those dimensions where the person's score is extremely high or low, and 
he can be asked to take a further test on each dimension where his score is 
less extreme. This procedure is repeated until a decision has been made re- 
garding every dimension. The total amount of testing required for any individ- 
ual is considerably less than would be needed if sufficient items were admin- 
istered non-sequentially to make sure that every decision reaches the same 
minimum level of confidence. Somewhat more complicated variants of the 
procedure can be used for multidimensional tests where the tester is interested 
in identifying the person's highest aptitude or his salient personality charac- 
teristics, etc. 

Attention may be drawn to a paper by Somerville (60) which appears to 
represent a first step toward the mathematical study of sequential processes 
where each stage of decision making indicates what hypotheses should be tested 
next. Somerville studies a two-stage sampling problem where the first stage 
obtains data on many dimensions, and only the dimension with the highest esti- 
mated mean is to be tested in the second stage. In this problem, he is able to 
determine the optimum length for the first stage. 

There is another possible use of sequential technique in designing single- 
score tests. In two-category placement, it has sometimes been found desirable 
to use a "peaked" test where the difficulty of all items is chosen so as to be 
maximally discriminative at the critical aptitude level. With more categories, 
the efficiency of testing could perhaps be improved by adjusting the difficulty 


of items according to the person's performance on the preceding items. For 
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example, if there are three categories, separated by aptitude s' and s", there 


could be two levels of item difficulty. Level 1 would be such as to discrimi- 


nate most sharply between persons above and below s!; level 2 would discri- 


The test might begin with five items at level 1, which would be 
Persons who pass (say) three or more would be directed 
ould take further items 


minate at s". 
scored immediately. 
to try a group of items at level 2, while the remainder w: 
at level 1. At the end of this stage, those successful at level 1 (all items con- 
sidered) would move to level 2. Those who had attempted level 2 would be 
divided, some moving back to level 1, others remaining at level 2. Those who, 
2 are assigned to the top category, and those who 


in the end, succeed at level 
the remainder go into the middle cate- 


fail at level 1 to the bottom category; 
can be varied by introducing more categories and 


gory. Such a procedure 
by altering the 


more levels of difficulty, b 


number of items at each stage, and so on. 
de only a sketchy exploration of this procedure. 
ly when the items of the 


y altering item intercorrelation, 


We have mai It appears 


that such a sequential procedure is advantageous onl: 
test are highly homogeneous in content, i.e., have high tetrachoric intercor~ 
relations. Then, if the items are sufficiently homogeneous in difficulty within 
levels, and sufficiently widely spaced between levels, the various subtests will 
discriminate between individuals on the basis of their position on the aptitude 
the tetrachoric correlations, the greater 


continuum. In general, the higher 
he closer together may be the 


the number of levels which may be used and t: 
(50) that item intercorre. 
ld "difficulty factors", 


ange on the attribute 


division points. It is well known lations in educational 


and psychological tests are rarely high enough to yie. 


unless they are applied to a group having an exceptional r: 


h intercorrelations are presently enc 
such as Guttman-type attitude scales. These devices 
pment of sequential test designs. 


measured. Very hig ountered only ina 


few unusual instruments, 
would seemingly be benefited by further develo 
ga scholastic aptitude test was 
described by Krathwohl and Huyser to the 1956 meeting of the American Psy- 
lts (unpublished) indicate that the 
bility can be substantially reduced 


A sequential procedure for administerin 


chological Association. Preliminary resu 
time required for obtaining estimates of a 


by this method. 


THE BANDWIDTH-FIDELITY DILEMMA 


In deciding whether to use a test, the practical worker considers not only 
its validity but also its range of applicability. There is an obvious difference 
between the value of a measure of mathematical proficiency, used in several 
decisions about a student by his teachers and counselors, and that of an equally 
valid measure of drawing ability applied to only one or two decisions. The 
way a test is scored affects its value; a single overall score will usually have 
a narrower range of application than a pattern of sub-scores on the same test- 


Utility analysis Suggests that the contribution of a test be judged over all de- 


cisions, rather than in terms of validit for any one decision. It is this total 


contribution to the institution which determines which tests, or which scores, 
should be used. 


An example will show the Significance of this argument. Instruments avail- 
able to classroom teachers for evaluating pupil adjustment, such as question- 


naires, sentence completion tests, and Sociograms, have limited but positive 


validity. A pupil's Poor score on such an instrument can influence many of 
the teacher's decisions: decisions regarding how to discipline him, how much 


pressure for greater achievement to apply, what social activities to suggest, 


and whether to make a diagnostic case study of him. Although the adjustment 
inventory improves any single decision much less thi 


an a test of, say, spatial 
ability improves the decision to which it is relevant, 


the former test, appli- 
cable to many important decisions, is likely to be more beneficial. 
Working with a person over a period of time 


a great variety of treatments. 


» a teacher or therapist applies 


Each time, he classifies the person as requir- 


ing (ready for) the treatment or not. We may regard the test interpreter as 


having many hypotheses of the form: "Treatment ty is suitable for this indi- 


vidual". A decision to accept or reject is made regarding each hypothesis. 


These are typically placement decisions, since the Person not given the treat- 


ment remains within the institution. These decisions are independent when- 


ever one decision does not affect the others. We shall term a set of such inde- 


pendent decisions, a compound decision. Our consideration of the contribution 
of a test over decisions will be limited to this case. 
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A model involving independent decisions is admittedly oversimplified, 


since decisions about a person are likely to be integrated rather than indepen- 


dent. The decision of a counselor to recommend remedial reading for a stu- 


dent may be independent of his decision to try to broaden the student's partici- 


pation in social affairs, but it is more likely that recommending remedial 


reading will dictate postponing expansion of the student's social life. This 


example might be regarded as allowing four alternative treatments (modify 


reading and social program, modify reading alone, modify social program 

alone, modify neither). If the probabilities that a person will be given the four 
` 

redictable from the probabilities that he will be given reading 


we have a compound placement 


treatments are p: 
and social treatments, considered separately, 


decision (i.e., two independent placement decisions). 
While these categories might 


more likely that the appropriate 
d therefore this problem is no 


If decisions are not inde- 


pendent, the problem involves four categories. 
all be predicted from a single test, it is much 


predictors will involve at least two factors, an 
Such general classification problems are more com- 


longer one of placement. 
sonnel work than are compound placement 


mon in industry and military per 


decisions. Even where decisions are not strictly independent, however, the 


odel may be a useful approximation. 


compound placement m 
First, it is necessary to find an 


Our inquiry involves two sub-questions. 
e of a test or battery used for compound decisions. 


expression for the valu 
inciples regarding 


his expression to identify general pr: 


Secondly, we employ t: 
in problems involving compound 


the optimum design of the testing battery 


decisions. 


UTILITY OF A TEST IN COMPOUND DECISIONS 


We recall that the gain from making any one decision d is 


U. = A 22 
AUy = Byg cy [22] 


Bd is the benefit obtained when test y is used as a basis for this decision, 
e decision problem, the treatments, and 


and depends on the character of thi 
the validity of the test for whatever apti- 


the location of cutoffs, as well as on 
tudes are involved. 

If the same test, scored in 
several decisions about the same persons; 
further cost. The grand total contribution of testing is 


the same or a different manner, is used for 
further benefit is obtained without 


= - 23 
a avg = E Bya Cy [23] 


s more or less proportional to the number of 


The contribution of the test i 


90 PSYCHOLOGICAL TESTS AND PERSONNEL DECISIONS 


decisions for which the set of scores can be used. A test which applies tov 
similar fixed-treatment decisions contributes the same net utility as another 
test having v times as much validity which applies to only one. In adaptive 
treatment, where B , is a function of T a test must apply to v? similar 
decisions before it is as profitable as another test having v times as much 
validity for one such decision. 

One must be cautious in implying that tests can compensate for poor vali- 
dity merely by being used repeatedly. When the validity of a test is near zero, 
multiplication is most unlikely to raise the total utility to a satisfactory level. 
Moreover, a positive validity coefficient computed on one sample does not 
guarantee usefulness for a test. Sampling error of low correlations is sub- 


stantial, and our argument is based on coefficients established for the population. 


DISTRIBUTION OF EFFORT IN A MULTI-SCORE BATTERY 


The test designer and the user of tests frequently have to choose between 
careful estimation of a single variable and more cursory exploration of many 
separate variables. Tests may be constructed to yield separate scores on a 
number of diverse, internally homogeneous scales, or to provide a single 
measure loaded with the general factor underlying items. Particularly where 
the test is to be used in a variety of decisions rather than to predict one single 
criterion, questions arise as to whether to establish independent part scores 
or to obtain a careful measure of a single attribute (48). The person choosing 
published tests for a testing program faces similar questions, since he can 
use available time to measure one or two variables by means of long tests, 
or employ a much larger number of short tests measuring a variety of 
characteristics. 

This dilemma may be described in the language of the communications 
engineer as a choice between "wideband" and "narrowband" tests. In using 
a particular channel, such as a telegraph wire, one may either crowd many 
messages into a period of time, or give a single message slowly and repeti- 
tively. The former, more varied message has greater "bandwidth". The 
wideband signal transmits more information, but the clarity or dependability 
of the information received is less than for the narrowband signal except 
under ideal communication conditions. Random errors can seriously confuse 
the wideband signal; this is spoken of as a lack of fidelity. The tester's situa- 
tion is analogous. If he concentrates on facts relevant to a single decision, 
he gets a much more dependable answer than if he spreads his effort. But 
by concentrating, he leaves all his other questions to be answered on the basis 
of chance alone. 

This suggests that in any decision situation there is some ideal compro- 
mise between variety of information (bandwidth) and thoroughness of testing 


to obtain more certain information (fidelity). For the purposes of designing 
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ele å "EEN i m 5 
ctronic communication circuits, such ideal compromises have been worked 


out within the Shannon mathematical theory of communication (57). Because 


of its close analogy to testing problems, we once expected the Shannon theory 


to provide a basis for test design. Upon close examination, the Shannon model 


does not fit the tester's problem (23, 24). Though information theory is sug- 


gestive, the tester's problem must be treated within the more comprehensive 


mathematical structure of decision theory- 
We wish a maximally efficient strategy for gathering 


Previous solutions. 
These decisions 


information when a large number of decisions are to be made. 
which in the general case may or may not be cor- 


depend on various aptitudes, 
e been offered, 


related. Some relatively general mathematical solutions havı 


to easily comprehended or communicated results. Cer- 


tain restricted solutions have appeared in the literature. Elfving (32) and 


Chernoff (18) suggest a way t 


which minimizes thé total squared error of estimate. 
o the "locally optimum" division of effort under a 


The Elfving- 
nt, and does 


but none of them leads 


o determine the optimal distribution of effort 
Chernoff shows that this 


least-square index leads t 


wide range of conditions, provided error of estimate is small. 


that all decisions are equally importa: 


Horst (42) has employed similar 
where 


Chernoff method assumes 


not make allowance for a set-up cost Co: 


assumptions to obtain an optimal battery for fixed total testing time, 


iteria. 
d that the tests are intended to estimate the 
there being a dif- 


the battery is to predict various CT. 
These investigators have assume 


standard score on each criterion as ac 
The index use 
or what is equivalent, the sum of 


curately as possible, 
ferent criterion for each decision. d is the sum of the squared 


errors of estimate for the various criteria, 


where R is the multiple correlation of the test scores with 


plies to measurement problems where 


the criterion. Such an index clearly ap] 
mated criterion scores of the individual 


2 ; A 
Bisa over criteria, 


treatments will be adapted to the esti 
(cf. eq. [21])- 
Placing its entire emphasis on the v 


alidity coefficients as it does, this index 


g the contribution of a test. When all 


is not a fully satisfactory pasis for judgin 
they are in effect assumed 


s are reduced to standar: 
the index 
n measurement with an infinite 


d scores, 


criterion measure 
leaves quotas or cutting scores 


to be equally important. Secondly, 


out of account. While these are irrelevant i 
number of categories, the quotas become an important factor in utility when 


ect among a few alternative treatments. Finally, it is to 


tests are used to sel 


be noted that the leas 
g to our previous chapte 


t-squares criterion is relevant to adaptive -treatment de- 
rs is not a suitable index for 


cisions, but accordin 
agree with our analysis 


owever, 


This index would, h 
dity coefficient against each 


fixed-treatment decisions. 


in warning against judging test merit by the vali 


parately- 
o account the parameters 


necessary to impose other 


single criterion considered se 
of the decision problem. 


Our treatment will take int 
In so doing, however, we find it serious restrictions 


so that we deal with only certain special cases. 
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Equivalent decisions and tests. The first special case to be considered is 


that where each of a set of terminal decisions depends on a different aptitude, 
and there is a test which measures each aptitude. We assume that By a’ ty g 
i Pa 
Co’ and Ci are the same for every test, i.e., that the decisions and the unit 
test relevant to each are similar. The correlations among tests and among 
outcomes are zero. Both set-up and marginal costs are assumed to depend 
entirely on time, but set-up costs are independent of the length of the test. 
Bandwidth is described by v, the number of dimensions to be tested. 

The question is: If the tester has T units of testing time available, is he 
wiser to divide that time over all decisions or to concentrate on measuring 
one variable? Oris his best choice intermediate between these? When he 
lengthens one test, he improves its contribution but is left with many decisions 
to be made on a chance basis; if he uses many short tests he can cover all his 
decisions, but with highly fallible information. 

With equivalent tests and similar decisions, the tester should divide his time 
equally among whatever number of tests he gives. This is true because the 
relation of benefit to length (eq. (5.1)) is convex, and therefore the contribution 
corresponding to length zk + ka) is always greater than the average of the 
contributions at kı and kz The problem reduces to determining the optimum 
number of tests of uniform length. (If decisions and tests were not uniform, it 
would be desirable to use tests of different lengths and no simple statement 
about bandwidth could be made.) 

In fixed treatment, the contribution of any test as a function of length is as 
was indicated in Figure 28. The constraint on testing time fixes a total allow- 
able cost Cr: Appendix 7 demonstrates that the contribution from a battery of 
y uniform tests, each of length k and each used for a separate fixed-treatment 


selection or placement decision, is 


AU, = vBy a [24] (7.2) 
where 
c 
bly 
vs [25] (7.9) 
Co ed kc, 


If Cy is zero, v = C,,/kC). Substituting this value of v in [24], it can be 
seen that utility is a monotonic decreasing function of k. Utility then increases 
indefinitely as v increases (although ¥ could never increase beyond the point 


where each test contains only one item). If there were no initial cost it would 


be profitable to increase bandwidth up to the limit of the number of decisions 
to be made. 
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Where there is an initial cost, there is a limit to how short the tests can 
be because extremely short tests cannot repay the set-up cost. The relation 
of gain in utility, for all tests combined, to the number of tests has the general 


form shown in Figure 34. (As a matter of convenience, AU and v are both 


expressed relative to Cp in this figure. Cg, Cy and By q are set at +05, .02, 
1 


and .2 respectively.) 


4 
3 g, 
A 
< 
AU 2 
1 
0 70 20 


NUMBER OF TESTS (v) 


3 2 1 5 


g a. 
LENGTH OF TEST (k) 


Figure 34. Relation of utility to bandwidth 


If the uniform decisions involve adaptive-treatment placement, the equation 


analogous to [24] is 


r 
YkY; 
= AU,, = vB TARK: is C [26](7.14) 
kd yd T 
a Yy 


This function also is shown in Figure 34. ‘ 
The optimum length of any one test (which, substituted in [25], gives the opti- 

mum bandwidth) is obtained by maximizing [24] or [26] with respect tok. For 

fixed treatment under the specified conditions the optimum length is 
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[27] (7.13) 


[28] (7.25) 


It is to be noted that with the postulated uniform decisions the optimum band- 


width does not depend upon By a’ but only on the cost and intercorrelations of 
1 


unit tests. Even more important, the optimum number of tests has no relation 


to the number of decisions to be made (save as this provides an upper limit). 
If the optimum wis five ina given situation the decision maker should measure 
five dimensions, whether he is concerned with five decisions or one hundred. 
Moreover, we can arbitrarily define the length of a unit test, say by adjusting 
it to make EAA = .10. The optimum therefore depends entirely on the ratio 
C/C) for the tests under consideration. 
Such exploration as this demonstrates that it is frequently profitable in 
mong several tests rather than 


; Too great a dispersion of effort, however, 
is just as unprofitable as too great a concentration. 


bandwidth is profitable for independent 
initial cost Co relative to Cc). 


making independent decisions to divide time ay 
to devote all time toa single test. 


: 4 Personality questionnaires and inter- 
views meet this requirement and therefor, 


© can increase bandwidth profitably; 
in ability measurement, set-up time for nı 


ew di i : 
me ` dimensions is sufficient to place 
greater restriction on bandwidth. The foregoing analysis a, he 
Ssumes » however, 


that we cannot single out certain dimensions as being parti, 
Cc 


ularly important, 


ction and guidance, 
The values of B 


an assumption which is contrary to fact in most sele 


Non-equivalent, independent decisions. 


y,a Y@ry from test 
to test if the unit-test validities differ, or if the decisi, ; 


tests are not equally important, 
numbers of treatments, etc. It may be seen from (5 


increases as B, increases 
mal ka yja 


fixed or adaptive treatment. In general 


ough 
` this reduces bandwidth. This principle requires further qualificatio 
m, 


however, 
when ry or the ratio C,/C, differs from test to test, 


11 
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Appendix 7 indicates a general solution for the optimum distribution of 


placement which takes into account all parameters of the 


effort in adaptive 
Then, designating the 


unit test provided one defines the unit test by fixing Cy 


length of any one test as ky, the gain in utility over a set of v tests is 


y VEY 
2 üj =E B a = = E (Cog + R [29] (7-14) 
d=1 a Yı f yya d 


The total number of test units K = Zky is defined by 


K= [30] 
A composite parameter Dy ais introduced. 
l 
B.. gil > ) 
a yy 1a 
Dg = sol = (31] (7-17) 
Yi yya 


Then, for any specified set of v tests, the optimum length of the test for the 


decision d' is given by 


ya' v 

kg) 2 a ep EE Z [32] (7.21) 
d r, t ZD 17 d 
yyy,4 y,¢ YY 


Appendix 7 discusses an itera- 


There is a similar equation for each other test. 
of tests, with 


tive procedure for comparing the utility achieved using one set 
their optimal lengths, to that from another set of tests. No comparable expli- 
cit formula for ka can be obtained for fixed treatment, because the system of 
equations (7.6) of the appendix has not been solved. 


Horst's procedure for finding the optimal group © 
allows the tests and criteria to 


f subtests and their appro- 


priate length is also an iterative method, but 


be correlated so that a change in length for any one test may affect the utility 


With this added complication, the optimum lengths ca! 
[32] and must be determined by 


for all decisions. nnot 


be expressed by an explicit formula such as 


successive approximations. 
Our special case gives a somewhat clearer picture of the factors affecting 
distribution of effort. Increase in By a (relative to C) for one particular deci- 
1 


sion increases k,. Increase in r for any d markedly decreases kg: When 
d yyy ,4 = 


tests are alike with respect to these parameters it is best to divide time evenly 


among them. As one test takes on greater validity or importance, other things 
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remaining equal, it is better to give more time to that test. As the benefits 


from the unit tests become more unequal, a point is reached where it is more 


profitable to spend all available time on one test. For a dimension where unit 


tests have low intercorrelations, benefit increases greatly with length; but 


where the intercorrelations are high, there is littl 


value in lengthening the 
test. 


Koopman (47) has made a general study of distribution of effort between 
two experiments. He introduces an exponential " 
benefit to length of test, which is comparable to 
Spearman-Brown formula. 


sistent with ours. 


return function" relating 


our function based on the 
Despite this difference, his conclusions are con- 


We present a hypothetical numerical 
conclusions. tests, with these respective values 
yya at unit length: 5,,30; 1, .30; 0.5, .30; 0.5, .10; 0.1, 
+10. We assume Co to be negligible and let K = 20. Computing from [32] 
gives these respective lengths: 11.2, 3-7, 2.0, 5.6, ana -2.5. The negative 
insufficient benefit to be used when K = 20, 
set. When 


value implies that test 5 yields 
so it must be dropped from the 


the optimum lengths are recom- 
puted, with v = 


3-3, 1.7, and 4.6. The values of 
the parameters in this example were delibe 


and 3:1. 


ame benefit at unit length but be- 


st 4 are Smaller, it has a much 


he tests are ordered according to 
the ratio of B/C, to r. 
Generalizations. In establishing generalizations 
» and also fro; 
These results apply to a single-sta 
decisions, 


i i » We pool the information 
derived from our analysis 


m the work of Horst and Koopman. 


Se battery used as a basis for terminal 


l. It is generally profitable to divide testing time am 
on; 
rather than to concentrate on a single te Rae 8 S tests 
any decisions 
ns, 


st 
are to be made regarding the same Perso 


+ Using fewer tests than there are decision, 
dividing time over all decisions, 


not all ne ligib 
contributions of the tests at equal length Siigible, (b) the 


3 


For any given problem there is an optima) distribution ed 
e 
both with respect to number of tests to be given ang aed tes 
of time 


to be devoted to each test. Where the tests are uniform 


to divide time equally among the tests given. 
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not uniform, it is profitable to allow time to the tests roughly in the 


order of the ratio of their contribution per unit cost to the intercor- 


relation among unit tests. 
4. It is not critically important to employ precisely the optimum length 
of test. Minor departures from the optimum reduce utility very 


little in the cases we have examined. 


Our analysis confirms that there is indeed an optimal compromise between 
"bandwidth" and "fidelity" in testing for compound placement decisions. Simi- 


lar results are to be expected for classification decisions. The validity coef- 


ficient of a test relative to a particular decision describes only one aspect of 


its usefulness. Where it is in competition with other tests, the amount of time 


to be given to it depends upon its contribution per unit cost and the correlation 


among units. The cumulative benefit from a series of moderately valid tests 


may outweigh the benefit to be expected from a smaller number of more depend- 


able tests. 


CLASSIFICATION DECISIONS 


Classification decisions, unlike those previously discussed, are necessarily 
based on multivariate information. 


work on classification in the language of deci 


brings together a variety of hitherto isolated 
important questions. 


problen 


sion theory. Our discussion 


» and also raises many 


atment classification 


cussed in a less technical fashion 


» Our aim being to examine the meaning of 
adaptive treatment in a classifica 


tion problem. The Proposals of Horst (41, 42) 


regarding the design of Classification or differential-prediction batteries ar 
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then considered in terms of the assumptions about the decision problem which 


they appear to invoke. Finally, we consider possible development of sequen- 


tial methods in classification. 


CLASSIFICATION WITH FIXED TREATMENTS 


The most common classification problem is one in which certain fixed 


treatments are available, and each man is to be assigned to one of them. Each 


man i yields a certain payoff e;, under whatever treatment t he receives; with 


n treatments, there are n values of e; for him. Some writers formulate an 


assignment strategy in terms of operations upon the n by n matrix of estimated 


ee We, however, shall consider the test sc 
payoffs as the starting point, to obtain a form 


ores rather than the estimated 
ulation readily applicable to 

large samples. 
. is available. We assume that payoff is a first- 


A set of tests y3» Ypr° ° 
For each treatment t, there is a 


degree function of scores on these tests. 
able by multiple-regression methods 


eatment. The adequacy of this pre- 
The several Y, need 


linear combination of scores Y, determin 
which best predicts the payoff under that tr 
diction is indicated by the multiple correlation Ry e 

tt 


ment, payoff is assumed to be zero 


not be uncorrelated. (For the reject treat 


but this function is merely a degenerate case of 


regardless of the test scores, 
the ordinary regression equation.) 


EA 35 
A vector notation GA or Y;) is used to indicate t! 
for each ¥ there is a correspon 
ern of predictor scores; 


he particular pattern of 
ding Y. When 


scores belonging to person 1) 
treatment t is applied to a man having a given patt 


the estimated payoff is 


we 2 33 
e = g Rye Vig T Sot ee [33] 


t states a complete set of values of P/f or 


Any strategy for assignmeni 
y employed is optimal or not, we can 


PE% Regardless of whether the strateg 
express as follows the total utility which is attained by using it: 
i 7 > 34] 
=Z R pep, yY, dY + = Pot 2E [ 
a a ref YPT t BRR” ig. Y 
LY 


In the second term, Py is the propor“ 


This equation applies to either 
all persons having the same 


~ 
The integration is over all values of Y. 
s assigned to the treatment. 
In an optimal strategy, 
y is 1.00 f 


tion of person 


fixed or adaptive quotas. 


e same treatment; Pt/ or one treatment and zero 


~ 
Y are assigned to th 


for all others. 
These 


having the form of [33]- 


ment there is a payoff function 
airwise in hyperplanes of one 


For each treat 


equations describe hyperplanes which intersect p 


100 PSYCHOLOGICAL TESTS AND PERSONNEL DECISIONS 


less dimension. The intersections divide the space defined by the Ý dimensions 
into regions such that for each region there is one particular treatment which 
gives the highest obtainable payoff for every person in the region. Where there 
is no quota constraint, these intersections therefore indicate the optimum 
strategy. 

Where quotas must be satisfied, the hyperplanes must be shifted parallel 


to themselves in such a way that the adjusted regions contain the desired number 


of persons. Cardinet (17) has developed a procedure for graphic determination 
of this strategy where there are just three treatments. From the estimated 
payoffs for the three treatments, the first centroid factor is extracted. The 
scores for each person on the two remaining orthogonal components are plotted. 


Three lines in this space indicate the locus of points where two of the estimated 


payoffs are equal. These would serve as boundaries of regions in the quota- 


free case. If fixed numerical quotas are to be satisfied, one counts the number 


of persons in each region. After observing which regions have too many per- 


sons and which have too few, one shifts the boundary lines in the appropriate 


direction parallel to themselves; to facilitate this step, Cardinet draws the 


lines on a plastic overlay which slides to the new position. 


A count is made 
of the number of persons added to or subtracted from ea 


ch region by each 


shift. The position of the lines which satisfies the quotas is quickly located. 


ain region to t, and the integral 


is equivalent to the integral of PřY,dŤ over that region. Y 
will not in general be normally distributed withi y 


is normally distributed over all individuals. 


in equation [34] 


n this subgroup even when Ye 
The reader may confirm this by 
atments, the third being reject 
i and 5> being uncorrelated. Suppose further 


considering the very simple case of three tre; 
and the composite scores Yi 


that Fe, =i e, = Ry e, and that the quota Py) =p. 
ihe s ety Z 
will be assigned those persons at the upper end of the ¥ 
t 


Then to treatment ti 


distribution, except- 
ing those for whom Y, > Y. . i Fi 
8 5 t| Because of this last requirement, the distri- 
bution of Y, for those assigned i 
tf; g to ti will not be normal. The lack of normality 
of the Y distribution makes it impossible to sim 


; ; plify the equations for utility 
in classification. 


To study the gain in utility and the factors which 


i determine it, we now 
restrict ourselves to the fixed-quota case. 


The a priori utility is Dpe | and 
the gain in utility from testing is Tiot 
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AU = Zo_R [ pep, gY, dY - EC 35 
A eft A 35] 


Writing Y for the average value of Y, among those assigned to treatment t, 


AU = Zo_R PY, -EC 36 
E w eter * yY [36] 


Equation [10], which described utility in placement with fixed treatment and 
fixed quotas, is a special case of [36], since AE, in [10] equals Pe Whereas 
[10] could be evaluated directly from the tabled normal distribution, it is 
difficult to evaluate [36]. For three treatments it is possible to make use of 
the Pearson tables of the bivariate distribution, but the computations for even 


this limited case are laborious. We have not evaluated equation [36] for vari- 


ous values of the parameters, b 
Brogden (13) who computed, under certain special assumptions, the average 


utility per man selected. 
Brogden considers the selection-classification problem where some men 
tments. 


ut we may review some of the findings of 


are to be rejected and the remainder are divided among several trea 


He determines the value of decisions based ona single "general" predictor 
gle "g 


(equally correlated with outcome under every treatment) and compares this 


to the value obtained by using a separate differential" predictor for each 


criterion. The general predictor permits rejection of the 
only by chance. The differ- 


poorest men, but 


the men it accepts can be allocated to treatments 


entially-scored battery predicts the outcomes for the person under the several 


treatments and therefore, insofar as quotas permit, one can assign him to the 


treatment which promises the largest outcome from him. Brogden assumes 


s among the predictor scores Ti obtained from the dif- 


zero intercorrelation: 
between an outcome 


ferential battery, and assumes that the correlation Ry e 
tt 


and its corresponding predictor score is the same for all treatments. Other 


assumptions include first-degree payoff functions, equal ni and a normal 
t 

distribution of Y's. The quotas are equal for all treatments other than reject. 

The value of each type of battery is determined for varying numbers of 


treatments. Figure 35 shows the benefits obtained when the number of treat- 


ments for accepted men varies from 1 to 5, the number of predictors in the 


differential battery increasing accordingly. The quota for each treatment 


The validity for the general predictor 


other than reject is fixed at 20%. 
alidity coefficient 


with respect to each treatment is taken as .50, and each v 
Rye, of the differential battery is .50. Where just one treatment is avail- 
able for accepted men, the differential predictor is identical to the univariate 
one. With two treatments in addition to reject, 40% of the group is accepted. 
accepted is necessarily lower than in the first 

he differentially scored 


ents is further 


The average quality of men 
U is much less serious with t 


case, but the decline in A 
As the number of treatm! 


battery than with the single predictor. 
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AVERAGE APTITUDE OF 
MEN ASSIGNED 


NUMBER OF TREATMENTS FOR ACCEPTED MEN (n) 


Figure 35. Quality of men assigned with unidimensional and differential 


information 


increased the advantage of the differential battery becomes even more appa- 


rent. Brogden's example, one should note, assumes that differential predic- 


tors have the same level of validity as the general predictor 
true for differential batteries so far develo 
however, 


» which is not 
ped. These mathematical results, 
encourage efforts to improve differential predictors, since their 
potential advantage is great. 

In Brogden's results it is notable that 
to the predictive validity of the tests. 
validity as any of the differential predi: 
the greater gain in utility. Equation [3 


gain in utility has no simple relation 
The univariate predictor has the same 
ctors, yet the differential battery yields 


4] indicates that, as we have previously 
found for other fixed-treatment fixed-quota decisions, utility is a linear func- 
tion of the validity coefficients. 


is altered, the payoff surfaces 
This should lead t 


o a change in the assignment stra- 
tegy and thus change the value of Te 


The net change in utility is therefore a 
n validity, 

To clarify the relation of utility to differe 
equation [36] 


quite indirect function of the change i 


ntial validity, we can express 
in terms of orthogonal variables. 


The tests Ya’ Yp» + + + may be 
resolved into orthogonal components y 


j7 Y Yz2»---.- While any set of ortho- 


»itis perhaps worth- 


t 
Any components of the test battery which are uncorrelated with outcome will 


Equation [36] holds when the 
Yi are predicted from the estimated Yij instead of from the original test scores. 
Under the assumption of a first-degree Payoff surface, 


then disappear from the estimation equations, 


y;t e 


ew, =o Er 
i tj 


Yio F e “ 26 37 
yje ij se Sy 37] 
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We may introduce a special notation for the purpose of considering differen- 
tial validity. Let t' be the treatment to which the man is assigned and Yiq ') 
j(t 


be the average y, among those assigned to t!. Then, by a development analo- 


gous to that for [36], we find that if each man is assigned to the treatment opti- 


mal for him 


AU = = y. - 
Sey Z yepe T F Sy [8] 


WT i 
the gain in utility which results from assigning persons at 


For any y., .=0 j y: 
y $ „PO Yj and therefore over all t and j, PR yer (t!) = 


This expresses 
random to all treatments in the proportions p,, and is of course equal to zero, 
according to the definition of a priori utility. Subtracting this expression from 


the right member of [38] and rearranging, we obtain 


AU = EZ Py; E p(s -5 -2 
aa Pejn > Pel tarjo e yje) z ey B 


This equation is analogous to [11] for gain in utility in placement. The term 
in parentheses is the difference in the slopes of the payoff functions for the 
t to a particular Yj: This equation 
71) that the value of a measure for 
predict outcome within treatments, 


specified pair of treatments with respec 
demonstrates again the well-known fact ( 
classification depends not on its ability to 


but on its interaction with treatment. We remind the reader, however, that 


the hyperplanes bounding the region where persons are assigned to t' shift 
as the validity shifts; therefore, even when the quota remains constant, Yj (1) 
varies with the validity. This complicates the relation between utility and 


validity. 
The discu 


ssion of implications on pages 67-68 applies to classification as 
As was mentioned there, the importance of differential 


but the role of E has been too little 


in dividing men between two 


well as to placement. 
validity has been discussed previously, 


recognized. For a test (or a factor) to be useful 


g for treatment t) differ from the corres- 


treatments, it is necessary that Cey 
For 


ponding product for t3; inequality of the correlations does not suffice. 
example, suppose one predictor dimension has validities .20 and .60 for two 
treatments. This might seem to promise differential value, but if the respec- 
tive s, are 3 and 1, the predictor is not useful for making assignments. Gon- 
ght that a factor has no differential value if its 

If the variance of the two 


basis of this factor 


versely, we might have thou 
ome is .40 for each treatment. 
making assignments on the 
omes expressed in 


correlation with outc 
payoffs is different, however, 
may yield appreciable gains in utility. Analyses of oute 
standard-score form have led to oversimplified discussions of the value of a 
classification test solely in terms of validity coefficients. In a given practi- 
will not have 


cal situation, a difference of one standard deviation in outcome 
f considering the slope 


the same value for every treatment. The importance o 
of the payoff function in utility units is therefore obvious- 
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It will be noted that the estimation of payoffs in this section has been based 
on regression formulas, and that the intersection of regression surfaces has 
been used to determine the boundaries of assignment regions. We have avoided 
use of the well-known Fisherian discriminant functions because the underlying 
model of the discriminant function appears unsuitable for personnel decisions. 
The basic concept underlying the discriminant function is that a number of 
classes exist, to each of which a particular individual "belongs". Differences 
among members of a given class are disregarded. This function divides the 


score-space into regions so as to maximize the probability of "correct" classi- 


fication of each individual. It is possible to weight errors of a given type or to 


specify acceptable Neyman-Pearson risks for each of the possible types of 


misclassification. The regression model, on the other hand, is consistent 


with the view that there is no uniform cost of assigning a person "belonging 
in category A" to category B. 


Instead, the cost depends upon the pattern of 
scores. 


The theory of the geneticist is based on discrete classes such as 
species, and it is meaningless for him to speak of different payoffs from allo- 
cating different members of the same class in the same way. Here, the dis- 
criminant function is fully appropriate. The model is much less appropriate 
in personnel decisions where there is no theor 


y of qualitatively different types 
of persons. 


Even in clinical diagnosis where categorical labels are commonly 
used, the discriminant function is probably unsuited. 
phrenic" 


egy. 
By assuming that 


st scores, we arrive 


the true shape of such functions in classification, 


CLASSIFICATION WITH ADAPTIVE TREATMENT 


In both selection and placement, the c 
significant findings. 


sification could be defined, even though we do not go on to develop formulas 
for this case. ss 
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It has been customary in job classification, in clinical diagnosis, etc., to 
think of each treatment as discrete and qualitatively different from the alter- 


natives. While qualitative differences distinguish such pairs of treatments as 


lobotomy and psychotherapy, in many other personnel decisions we may think 


of a continuous multidimensional manifold of treatments (cf. pp. 24-25). Thus, 


while parole and continued imprisonment seem at first glance to be qualitative- 


ly different, one may conceive of a continuum between freedom and strict incar- 


include relaxation of supervision in prison, fur- 


y reporting. The full range of possible treat- 


ceration. Intermediate stages 


loughs, and parole with mandator 


ments can be characterized only by considering many dimensions; besides 


degree of liberty, there can be variation in the amount of education and rehabi- 


litation offered, the degree of sympathy displayed by supervisors, and other 


aspects of the treatment. 
This concept of adaptation is relevant to most areas of applied psychology 


where treatments have hitherto been viewed as discrete and unordered. Teach- 


ing methods certainly may vary by degrees along such continua as amount of 


and amount of explanation. Between jobs 


active practice, rapidity of pacing, 
mediate possibilities combining the 


there is often a similar series of inter 


features of the several jobs in various degrees. In the Strong Vocational 


Interest Blank, for example, Chemist 
but there exist interm| 
Strong's factor analyses of interests 


and Author-journalist seem like diver- 
gent vocational paths, ediate opportunities such as Science 
reporter or Editor of chemical reports. 


garded as identifying some of the continua along which occu- 


indeed may be re; 


pations (i.e., treatments) differ. 
tments can be adapted will depend on the situation. 
dividual, it is possible 


Therefore, as in selec- 


Whether trea Often, 
although it is imp: 
average level of aptitude in each group. 
), the ultimate aim of research becomes the discovery of the 
sification procedures, within 


ossible to adapt treatment to each in 


to adapt to the 


tion (see p. 44 


best combination of treatment categories and clas 


1l constraints exist. 

ne adaptive classification formally, it would be necessary 
mensions S}; S2% ' tt This could be done simply by 

f the orthogonal y dimensions pre- 
A 


whatever practica 
In order to exami 


to introduce aptitude di 
"true score" on one o! 


defining each s as the 
for each corresponding y and s- 
. 


viously discussed, so that Tye, = Tys"se, 


of the lengths of the tests used to 


eatment was characterized by the 
When there 


n's pattern of s scores is independent 
In placement, the tr 
o a single s dimension. 
n for each treatment has a slope 


perso: 


measure his y pattern. 
yoff function with respect t 


the payoff functio: 
The nature of the continuous payoff 


sions is therefore much more 


slope of the pa 


are several s dimensions, 


with respect to each of these dimensions. 


ng all treatments and all dimen: 


surface involvi: 
Since at present 


complicated than in placement. 
urfaces involved in adaptive 


we have no empirical facts 
about even the simple s placement, it would be un- 


duly speculative to consi 


der what form the surface for a classification problem 
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might take. Ultimately it should be possible to choose reasonable assumptions 
and to derive the relations between utility, the characteristics of the decision 


problem, and the tests employed. 


DESIGN OF TESTING PROCEDURES 


Single-stage batteries 


How to design a test battery which will be maximally efficient for the clas- 


sification of personnel is a problem of great importance, for which only Horst 


has offered solutions. By examining what assumptions about the decision prob- 


lem his model entails, we can Provide a partial basis for evaluating them. His 
two proposals deal with constructing the best batteries for "multiple absolute 
prediction" (42) and "differential prediction" (41). 


that the former procedure is designed to select that subset of tests which will 
maximize ERÈ 


Y.e.° This is an appropriate index of gain in utility in compound 
€ tt 
adaptive placement, 


We have already seen (p. 91) 


where several treatments will be given to each individual, 
each treatment being adjusted to his ability on one dimension. 


proposal is to select tests which maximize 
validities" of the test battery, 


Horst's other 
the sum of the squared "differential 


A differential validity coefficient indicates how 
well the given tests can predict di 


By simple analogy 
to the compound adaptive place: 


ment problem 
interpretation on the Horst "di: 


fferential pre 
minimizes the sum of Squared errors of est 


» we can immediately place one 
diction" Procedure. This procedure 
imates of criterion differences, 


Procedure minimizes the same errors for 
This formula the 


series of compound decisions 


whereas the "absolute Prediction" 
the original criteria, n does arrive at the best battery for a 
» where each decision depends on the measured 


lon scores. Every Pair of differences is taken 
into account successively in determi; 


individual. 


difference between two criter. 


ning the series of treatments for the 


gle variable; 


In this case 


» as [36] makes clear, util- 
e Tather than their squares 


» and the boundaries 


into the determination of utility. The Horst solu- 


tion, moreover, makes no adequate Provision for a re. 


ject group who receive 
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no courses, job assignments, etc. Thus his analysis would apply only when all 


individuals tested are to be utilized. 


Perhaps a generalized payoff function for an adaptive-treatment problem 


could be found such that gain in utility would be proportional to the sum of 


squared errors which constitute Horst's index. This would require that the 


payoffs under various criteria have. equal variances, that the criterion dimen- 


sions have equal weight, and that the summed cross-products of errors be 


unimportant. It appears that only in a very special case will Horst's index be 
exactly proportional to gain in utility. 


Horst has offered the only systema’ 
and thus has taken an important pioneering 


tic procedure for maximizing the effi- 


ciency of a classification battery, 


step. The function used to define efficiency do 
and it is demonstrably not the correct func- 


plies the method. Fur- 


es not correspond clearly to any 


common type of decision problem, 


tion for the fixed-treatment example to which Horst ap 
s method is a useful approximation for cases 


ther work may show that Horst! 
ble on logical grounds. 


where some other efficiency index would be preferal 


Until such evidence is available, we cannot have confidence that the battery 


method will be maximally efficient for a particular deci- 


developed by Horst! 


sion problem. 


Sequential testing 


Sequential method: classification be- 


s should be especially advantageous in 


f dimensions are usually involved and exhaustive mea- 


Often a limited amount of information will 
n, and to indicate 


cause a large number 0} 


surement is out of the question. 
serve to eliminate a number of treatments from consideratio: 


ons as especially critical in choosing among the re- 


maining treatments. If, for example, a general aptitude battery consisting of 
hows that person i stands high on the aptitudes required 
on whatever aptitudes 


one or two of the dimensi 


a series of short tests s 


for jobs A and B, the next stage of testing can concentrate 


r another person having @ different pattern on the 


differentiate A from B. Fo 
t tests would be relevant to the fin: 


al decision. 


general battery, quite different 
e individual 


Such sequential testing in which tl 
should be more efficient than a no 


The basic theory for multidimen: 


he second stage is tailored for thi 


n-sequential plan. 


sional sequential testing can be di 
“the sequen 


erived 


re (52). He was investigating 
to be directly applicable 
al form. 


from an unpublished paper by Magwi 
, but his methods appear 
the decision problem in a gener 
After any stage of testing, the individual 


or may be held for further testing. At 
ment can 


tial choice of experiments" 
to testing of individuals. He states 
A number of tests are available. 


may be assigned to a fixed treatment 
Is expecte 
atment is greater than 


d payoff under each treat. 


each stage of testing the person 
vould probably 


If the payoff under one tre 
thus ar 


be estimated. 


be obtained by testing him further, 
incurring additional costs of t 
Magwire 


riving at a more certainly correct 


decision but esting, a terminal decision is made. 


Wherever it is profitable to test further, 


tg procedure indicates which 
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test should be given next. The procedures developed are a generalization of 


Waild's recursion formula (69), and take into account the payoff functions and 


the cost of testing. 


From Magwire's work it appears possible to develop an ideal sequential 


strategy for the quota-free fixed-treatment case which is often encountered 


in guidance or diagnosis. A strategy can also be developed for the case where 


population quotas are fixed, provided that the population distribution is known. 
Where a finite numerical quota must be exactly satisfied in a particular sample, 
however, Magwire's solution would not apply. 


At this time, no use of Magwire's 
formula in actual computations has been reported, and the computations may 
prove far too laborious for Practical use. Even so, the theory opens the way 


EVALUATION OF OUTCOMES 


The assignment of values to outcomes is the Achilles heel of decision theory. 


aluated, one can proceed ina fully rigorous fashion 


Once outcomes have been ev: 
The evaluation of out- 


to compare particular decisions or general strategies. 


comes, however, seems often to be arbitrary and subjective, leading one to 


question whether any of the conclusions from decision theory can be trustworthy 
if the starting point itself is open to dispute. 

The most telling answer to this objection is to point out that decision theory 
invokes no more subjective evaluation than does any method of arriving at cour- 


ses of action. Every choice between actions involves evaluations, and every 


doctrine or set of principles embodie 
more dependent on evaluation than is tradi 


The unique feature of de 
payoff matrix or by conversion of 


s value judgments. Decision theory is no 
tional measurement theory or dis- 
criminant analysis. cision theory or utility theory is 
that it specifies evaluations by means ofa 
The values are thus plainly revealed and open to 


the criterion to utility units. 
a defect of this system, as compared 


criticism. This is an asset rather than 


with systems where value judgments are imbedded and often pass unrecognized. 


It is always difficult to set down a payoff matrix assigning comparable 


values to all the consequences of a decision, as those who have applied game 
theory to military strategy have discovered. Comparing the value of bombing 


an enemy city to the value of preventing a flood at home seems absurd because; 
y city P: € 


at first glance, these events are incommensurable. The comparison is required, 
., shifting eng: 
bility of one outcome at 


however, by the fact that a single act (e.g ineers from dam con- 


struction tọ airfield construction) enhances the probal 


Whoever decides on such an a 
whether he does so consciously or 


not. Personnel decisions likewise require a balancing of seemingly noncom- 
humanitarian outcome 


the expense of the other. ct is weighing the 


nincommensurables" on the same balance, 


parable outcomes. The personnel manager may let a 
ble losses in produc- 


pect of an aging worker offset tangi! 
(ego religion, social 


such as the self-res 
tion. A school may s 


class) which have no rel 


elect students on the basis of factors 


ation to probable academic achievement; in so doing, 
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the policy maker is allowing outcomes other than achievement to compensate 
him for the fact that he is not securing the best possible students. 

In clinical decisions, the outcomes include the duration of treatment, the 
amount of staff effort absorbed, and the contribution (or cost) of the patient 
to the community after release. It is at this point that the "balance sheet" 
concept seems discordant with the humanitarian purpose of the institution. 
The decisions made, however, must consider the community welfare. "We 
had better take this patient in, to relieve strain on his family"; "We had better 
use our space on cases where there is greater prospect of recovery"; "We 


cannot invest this much therapeutic time on a single patient"; "This treatment 


does not cure, but it makes the patient more manageable" -- all these state- 
ments reflect an intention to obtain maximum advantage for minimum cost. 
Nor is this incompatible with concern for the patient! 


s welfare. If it wishes, 
the institution may include the patient! 


s expressed feeling of well-being in its 


evaluations. The critical point is that the institution must still balance this 


against other outcomes; cheerfulness purchased at the price of delaying the 
patient's becoming independent may be a bad bargain. 


Much attention has been given to the logic of evaluation and to procedures 
for making estimates of value (see 1, 4, 58, 65), but these difficult problems 


have not been handled as successfully as the choice of strategy once values 


are assigned. The progress to date consists largely in defining the problem 


of evaluation and distinguishing among different approaches to the problem. 
We shall review these possibilities as the 


y might be applied to personnel 
decisions. 


Attention was drawn in Chapter II to differences between 


"institutional" and 
"individual" decisions. 


In the former 


» a single decision maker is concerned 
with a large number of decisions of t] 


he same sort 
can be given to the concept of the total utility of a 


the decisions are evaluated by the same payoff ma 


, and a reasonable meaning, 
set of decisions, since all 


trix. The individual decision, 
on the other hand, must be evaluated by the individual's personal payoff matrix, 
and the same person rarely confronts the same decisi 


on repeatedly. There have 
been attempts to formulate strategies which take into 


account simultaneously 
the welfare of many individual decision makers. 


Such "welfare economics" 


gainst another's dissatis- 
ainst another only in terms 


r. It follows that the indi- 
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INSTITUTIONAL EVALUATION OF OUTCOMES 


Comprehensiveness of evaluations 


Although it is a truism to say that in evaluating a decision all outcomes 
must be taken into account, the principle is deserving of some discussion. 


Employment decisions are commonly validated solely against some measure 


of the proficiency or rate of performance of the men accepted. There are, 


however, many other consequences: some number of beneficial suggestions 


or acts of leadership, some costs of training and supervision (see 27), some 


degree of absenteeism and some spoilage of materials. 
s of a decision is well illustrated by the studies 


The variety of consequence 
who attempted to determine the value 


of Raines, Hunt, and associates (44, 54) 
to the Navy of a neuropsychiatric screening program. 
ces for which actual criterion data could be obtained were frequency of sub- 
of bad conduct discharges, amount 


Among the consequen- 


sequent psychiatric discharges, frequency 


of time spent in hospital, and frequency of disciplinary offenses. In addition, 


they point out that there are probable differences between screened and un- 


screened men in proficiency and efficiency of job performance. 


In estimating the cost to the military service of utilizing marginally 


adjusted men, the assumption is sometimes made that this cost involves 
solely those men who are discharged before they complete their required 
period of service. It is taken for granted that, ifa marginal recruit 
manages to complete his enlistment and to receive an honorable dis- 
charge at the end of his term, his service is ipso facto successful and 
he has demonstrated his worth to the service. Unfortunately, completion 
of service without discharge is no guarantee of the quality of the service 


rendered. 

_.. These results... . show that such [marginal] men are more 
‘expensive! to the services, and that when their utilization is demanded 
by manpower needs it will be necessary to make provision for the added 
demands they will entail upon medical and disciplinary facilities. They 


also confirm the clinical picture of the maladjusted individual as one 
who, even when he is meeting the formal adjustment standards of his 
group, is doing so at a greater cost to the group's medical facilities and 
with greater friction upon its social organization than is his adjusted 
compatriot. (54, PP- 12-13) 

ow this detailed analysis differs from the ev 
sell, or Berkson and others who count hits, 
learly indicate wide differences 


It is important to note h aluation 
employed by Taylor and Rus 


and false positives. In this 


misses; 


situation, the data c 
and since these differences 


g successful men, 
ce in the 


in the quality of service amon, 
procedure they have to be given a pla 


are predicted by the screening 


evaluation system. 
n even more 


Goodman's discussion of the parole situation (37) exemplifies a 


The purpose of the prison system 


utcomes. 


comprehensive consideration of o 
he parole board must take into 


account 


is to benefit society as a whole, and t! 
on. If the man is paroled, he contributes 


all the social consequences of its decisi 
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to the community economy through his work, his supervision by a probation 
officer costs so much, he perhaps commits a new crime which has both direct 
and indirect costs, his children are better (or worse) citizens by virtue of his 
presence in the home, and so on. For the man not paroled, the above outcomes 
arise with some altered probability when his term ends, and there is in addi- a 
tion the cost of his longer imprisonment to consider. To such direct outcomes 
must be added the indirect effect of a parole decision on the conduct of the re- 
maining prisoners and on potential criminals in the community. Finally, the 
parole board gives some weight to purely human sympathy for the prisoner 
and his family. 

It is not easy to place a value on all these consequences of a decision, espe- 
cially as the effects extend indefinitely in time. 
simplify the problem. 


In practice, it is necessary to 
One possibility is to drop from consideration any con- 


sequences of minor importance to the institution before proceeding to more 


systematic evaluation. If various desirable outcomes are correlated substan- 


tially with each other, one need be little concerned to observe all outcomes 


since a decision rule which maximizes one of them would tend to maximize the 


others, and a simple weighting can account for the value of those not observed. 


In many situations, however, some of the important outcomes have low 


intercorrelations. Amount of Production per hour may be negatively related 


to job tenure, if there are ample opportunities for able 
paying jobs. 
motability, 


people to find better- 
There may be a zero correlation between production and pro- 

if supervision calls for different talents than routine operation. For 
this reason one cannot be satisfied with validating testing procedures against 


a single production criterion, nor with the assumption that perfect prediction 


of such a criterion would by itself provide the basi; 


s for an ideal selection 
system. 


Combination of outcomes b; 


y empirical analysis 


aims of the institution and can reduce oth 
(14) argue that in most business manage 


decisions is the "dollar criterion!" 


er data to it, Brogden and Taylor 


“intangible values, and thus 


the exchange rate for these intangibles can be inferred 


: It is not nec 
that an actual criterion be accepted as primary ghee 


; i Some abstract scale of utility 
may be the common metric to which all outcomes are commuted 
ed. 
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Value judgments are inevitable at some point in the statement of payoffs, 
but the number of judgments employed should be as small as possible to avoid 
inconsistencies in the system. Often the number of judgments can be reduced 
by accounting procedures, i.e., by careful empirical observation. If the sole 
reason for concern with spoilage, for example, is the cost of reworking or dis- 
the material, then a careful record of such costs indicates the serious- 


Brogden and Taylor describe in detail the logical and 
e "dollar criterion" 


carding 


ness of the spoilage. 
observational steps required to locate other outcomes on thi 


scale. Military personnel decisions perhaps allow for similar accounting, 


since one might hope to translate certain proficiency standards into their con- 


sequent contributions to fighting power. 
ciency are difficult to observe even in wartime, however, 


The important outcomes of combat 


morale and combat effi 


and cannot be studied empirically during peace. Comprehensive empirical 


ed by the geographical spread of military opera- 


accounting is likewise impedi 
in mili- 


tions. Itis therefore apparent that judgment will play a greater part 
tary than in business evaluations. The difficulties become even more over- 
educational and clinical institutions which seek to 


whelming as we turn to 
changes of diverse kinds and with dif- 


create long-lasting changes in people, 
fuse effects. 

The outcomes of pers 
scales (e.g., ratings or stand 


onnel decisions are often expressed on arbitrary 


ard scores on proficiency measures). For evalu- 


ation it is necessary to interpret these in relation to units of production or 


some other absolute scale. A man's contribution depends on the quality of his 


performance summed over the duration of that performance. A statement in 


such absolute terms is required not only to take into account differences in 


job tenure, but also to provide an absolute estimate of benefit against which to 


weigh the absolute costs of the information-gathering procedure. 
nd interviewing appear trivial when com- 
tended period. More careful reason~ 


sregarded. Experience from detailed 


At first glance costs of testing a 
pared to gains in production over an ex! 
ates that these costs cannot be di: 


ing indic 
in connection with industrial inspec 


accounting tion procedures is relevant (35). 


The cost of a 


n act of inspection is very small, and the benefit from improved 


the costs of 100% inspection frequently 


antial. Nonetheless, 
that sequential methods 


It was to restore this balance 
ch great dollars-and-cents 
Com- 


quality is subst: 


outweigh the benefits. 
were invented, and they were regarded as having su! 


value that for sev 
provides an argument for consid 


a military secret. 


eral years they were classified as 
ts of tests carefully. 


ering cos 
f workers assigned to jobs, 
ruly negligible we would be 


mon sense itself 
edures do improve the quality o 


Since testing proc 
if cost were t 


and since validity rises with length, 


indefinitely. The absurdity of this proposal implies at 


wise to lengthen tests : 
re costs begin to outweigh 


once that there is a point of diminishing returns whe 


best strategy for gathering information can only be 


increases in quality. The 


determined by precise data on costs of testing- 
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Inferring values from decisions 


The theory of utility scales has been built up from two somewhat different 
points of view, one concerned with eliciting direct statements of values from 
the decision maker, and one concerned with inferring his values from the deci- 


sions he makes. Either of these approaches can be applied to the study of per- 
sonnel decisions. 


The inferential approach employs the decision model in an inverse fashion. 
If one knows what decision a person has made ina great number of instances, 


then it is possible to determine what set of values is most consistent with these 


choices. Logically, this requires the assumption that the decision maker is 


following the correct strategy (except for random variation) for some set of 


values. A simple example of such reasoning may be taken from a study of a 


screening test presented elsewhere (23). This test was evaluated by its authors 


on the basis of the Proportion of persons at each score who later succeeded or 


failed; the limitations of that procedure may be disregarded for the purposes 


Their tabulation, after smoothing, indicates that in the 
recruit population the probability of success a 


lows: at 4, 95%; at 7, 88%; at 9, 70! 


of the present example. 


t certain score levels is as fol- 


» we observe that 


but not one with 50 
point is somewhere around a success rate of 


cesses balance forty failures 


with a success rate of 70% p% successes. For him, the balance 


60%. Since, for him, sixty suc- 


» We can say that the value of a success (relative 


to the zero value obtained from the rejects) 


with accepting a failure. 


outcomes ought to be; i y that the decision ma 
accepted those values, The decision maker, confronted with this information 
about his operations, i 


ker is acting as if he 


decisions. Sarbin (56), in a well-known 


‘ 9 is as a constant error 
or bias. This interpretation is sound if the quality of a judgment is evaluated 
But if we hypothesize instead that the judg- 


aluation System withi: 
Assumin 
a symmetrically distributed error of 


by the absolute error of estimate. 


ments are rational, we seek the b 7 
á a n which such "biased" 


g that the counselor may expect to have 
estimate 
mean error greater than zero if he regards ov, 


judgments maximize utility. 


sit is sound Strategy to have a 


erestimates as less serious than 
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numerically equal underestimates. It is indeed possible to calculate a set of 
values for over- and underestimates which would in this way rationalize any 


particular "constant error". It then remains to inquire whether one could 


reasonably regard underestimates as more serious. A counselor might con- 


tend that the underestimate has serious social consequences: it discourages 


the student and causes him to expect a lower standard of achievement from 


himself, perhaps to try less hard and to earn lower grades, perhaps even to 


drop out of school in view of his unsatisfactory prospects. The overestimate 


leads to false hopes, but the student will remain in school and obtain a valuable 


education even if his ultimate grades do disappoint him. Having read this 
casuistry (see Smith et al. (58) on the application of this te 


eveloped for himself a counter-argument to de 
This is precisely the 
tablished 


ave inter- 


rm), the reader has 


perhaps already d fend the greater 


ounseling of overestimates. 
point of the example. Thousands of years of philosophy have surely es 


n criticize a person's values on any grounds s 
d a second 


to defend 


seriousness in college c 


that no external judge ca 


nal inconsistency. If one judge prefers to emphasize one outcome an 


there is no scientific or logical basis 


prefers to emphasize another, 
stimates 


The "obvious" assumption that overe 
that the numerical magnitude 
ents and has 


one emphasis over the other. 
o underestimates, i.e., 
is itself based on value judgm 
This reiterates one of our 
licy~ 


are equal in seriousness t: 


of the error is the criterion of cost, 
er possibilities. 
ormulas are used as guides to po 


hich the decision maker might be 


no special justification over the oth 
fundamental theses: when mathematical f 
making they carry hidden value judgments w! 


unwilling to accept if he considered them. 
Extensive laboratory experiments have been made to infer value systems 

from behavior, chiefly in gambling (31, 65). These studies support the view 

racteristic patterns of action which 


All of these studies may be 
institu- 


that individuals or cultural groups have cha 


are consistent with different systems of values. 


uld be applied to important 
procedures merely distill 
an explicit 


regarded as developing techniques which co 
inferential 


out 


tional decisions. At best, however, 


To provide a basis for future conduct, 


values for examination. 
sion maker is required. 


acceptance or revision of these values by the deci 


Explicit value judgments 
might be obtained by asking 


me is worth on his persona 


ite difficult to make and if 
luation 


the person to state how many 


These judg- 
ned 


Explicit judgment 


utility units each outco 


1 value scale. 
ments are, however, qu the judgments are obtai 
repeatedly there will be variation in the eva onsis~ 
pout different outcomes. 
on of values is to emplo 
y degree 0! 
sistencies in the data 


koff (19) report the use 
] values in pusiness; 


of an outcome and inc’ 


The most common proce- 


tency in the information a 
y some method of psycho- 


dure to improve the estimati ihe 
i 2 { reliability, 
physical scaling. These meth: 
although it is often necessary to 


by describing them as uerror" (22) 
hniques to arrive at statements of institutiona. 


ods give a satisfactor 
suppress certain incon: 


)e Churchman and Ac 


of such tec: 
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each of the examples has its parallel in personnel decisions. One firm required 
a quality control strategy for inspection of penicillin packages. There were 
eight possible types of defects, and a comparison of their seriousness was 
required to specify the inspection procedure so as to minimize overall risk 
for’a given expenditure of effort. Nine persons carrying responsibility in the 
corporation (i.e., institutional spokesmen) participated in a scaling experiment, 
and from their responses a composite evaluation was established. Precisely 
comparable questions are involved in planning a personnel testing program 
where a decision must be made as to the relative importance of such various 


aspects of the criterion as carefulness, speed, ingenuity, stability, and leader- 


ship. A second firm described by Churchill and Ackoff was engaged in long- 
range planning, and used scaling methods to compare various "intangible" 
objectives of company policy. There is considerable similarity between this 
problem and that of comparing educational methods whose outcomes differ 
both quantitatively and qualitatively. 

The problem for which comparison of outcomes seems of most immediate 


importance is that of classification, All schemes for differential assignment 


assume at some point that the outcomes for the various jobs are evaluated 


on the same scale. Sometimes as a stopgap all criteria are reduced to stan- 


dgments of the importance of the 
Attention will also need to be given to the scaling of 

any single criterion. The assumption which is regularly made, that equal units 

epresent equal increments of value 


» is demonstrably 
false in some instances and open to question in all. 


EVALUATION FOR INDIVIDUAL DECISIONS 


n need be made between the individ- 
Where a series of decisions is to be made, the 


g the correct general decision rule or strategy. 
ach of these decisions on the same scale of values, 
and if that scale is one of equal i 


be obtained by summation. 


cal studies in the preceding 


chapters. In the individual decision, it is not Possible to choose between cour- 


ses of action save on the basis of th 
pertains. Ina group of students seeki the decision 
e decisio: 
Since the student 
sit is manifestly impossible to seek a 


for each must be evaluated on a difi 


will make a particular choice only once 
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strat i i i 
egy which is superior on the average, for the average has no meaningful 


definition. i isi 
on. A particular decision must be evaluated on the basis of the expected 


outcome and its value for this individual. 

One might ultimately evaluate a decision by its actual outcome, but this 
appears inappropriate since factors beyond the ken of the decision maker influ- 
aS the ultimate event. He might make a quite correct judgment in terms of 
his values and all ascertainable facts, which would nonetheless prove to be 
One's theory must make a place for 


" 
wrong" when examined with hindsight. 
ver certain. 


th 
e fact that the consequences of a given decision are almost ne 


Instead, with each decision is associated a certain probability distribution of 
preferable for which the 


expected consequences. Of two decisions, that one is 
sibility of repeating the 


total distribution is preferable. Since we deny the pos 
decision many times, it is meaningless to inquire which policy would lead to 
the greatest total utility over many applications. The judgment must be made 
on the distribution, not on its average. 
This is a point worthy of some emphasis, because here again the transfer 
ed for testing scientific hypotheses to decision- 
In the literature on counseling, one 
elping 


of statistical concepts develop! 


making has led to some misconceptions. 
s to the responsibility of the counselor for h 


finds numerous reference 
n" is almost invariably 


the client make the right decision. The "right decisio 
interpreted as being that course of action in which his mean expect 
This viewpoint has two faults. One is the implicit assump- 
sion is best for all persons having the same pattern 
tions implies that two stu- 

a and both be correct. 


ancy of 


success is greatest. 


tion that one particular deci: 


of test scores. Our discussion of individual evalua 


dents might draw different conclusions from the same dat: 
Secondly, the assumption that the mean of the distribution of outcomes is the 
proper index must be questioned. 

This problem may be considered most satisfactorily if we restrict ourselves 
la, where success is to be judged by 


to the simple problem of choice of curricu 
le grade 


hat a student's most probab. 
4.0 being the highest 
1 of impor“ 


grade average. Test scores might predict t 
average is 2.5 in curriculum A and 3.0 in curriculum B, 
Since a choice of one cur 
or may lead towar 


clear at the outset that succe: 
ken into account, 


riculum may involve denia 
da career which the stu- 
ss is un“ 


grade attainable. 
tant elements in his self-concept, 
dent believes he would not enjoy, 
likely to be the only criterion. When the student's values are tal 
we may find that a 2.5 average in A will actually be more rewarding to him 
than a 3.0 average in B. But assume for the moment that these averages qua 
there still remains the question of the proba- 
de average in curriculum A is difficult 
f estimate. Then perhaps 
13.25 - 3.75 in curriculum 
bility of grades in the interval 1.25 - 1.75. On the 
i f a grade between 2.75 


probability may be 1.00 oi 
these alternatives, one student may prefer the 


it is 


averages are equally appealing; 
bility distribution. Suppose that the gra 
to predict, so that there is a large standard error o 
there is a probability of -10 
A, along with an equal proba 
other hand (to simplify), the 
and 3.25 in curriculum B. With 


of grades in the interva 


118 PSYCHOLOGICAL TESTS AND PERSONNEL DECISIONS 


former alternative, having confidence in his ability to take advantage of the one 
chance in ten which would give him a very good record. Another, wishing to 
avoid risk, would prefer curriculum B. The choice between distributions, i.e., 
between hazards, rests on personality factors and affective responses. Ina 
choice such as this, it is impossible for anyone save the decision maker him- 
self to determine the "correct" conclusion. 

This was recognized some time ago in the influential statement of Bordin 
and Bixler (8). They point out that the counselor, equipped with facts, can help 
the student recognize the probabilities attendant upon each choice. In contrast 
to predecessors who spoke of the counselor as telling the student the correct 
course of action, they left the full responsibility for choice upon the student. 
While their argument was based ona theory regarding the way a person learns 
about himself, and perhaps on a view regarding the ethics of interpersonal 
relations, we now supplement their reasonin; 


g with the mathematical argument 
that "the correct course of action" 


can be defined only by the person whom 
the decision affects. 


The literature on counseling has 


information and the statement of 


However important it is to correct the client's misconcep~ 


tions regarding the probable outcomes of various decisions, this ï 
side of the decision Process, 
the decision is whether the cli 


t 
able value system. By » we mean fully acceptable to him- 


istent also with his hidden, even repressed 
has been written on the counseling pro~ 
a disclosure of aptitudes and weaknesses. 


d with bringing the person to full aware- 


cess as a realistic self-examination, 


Clearly, it must be equally concerne: 


ness of his own true values, 


hi 
Those who have Proposed follow-up Studies to evaluate the counseling pro- 


cess have been most con i i 
cerned with the client's growth in "objective self- 


knowledge" (i.e., knowledge of the 
with the "success" 


ms, partic 
tions between various wishes and hopes. It s 


study how the client's evaluation system i 
one can say whether a given change is " 


judge of his own values. 


ceems worthwhile therefore to 


Process, however 


THE IMPORTANCE OF DECISION THEORY 


The applied psychologist may regard test theory as but a'means to his 
ends, a series of mathematical rules which help him to refine his procedures. 
et ehcatigh test theory does have this function in psychology, it plays a far more 
important role as a sort of Gray Eminence, constantly and silently shaping 


the ends themselves. For test theory is both a source and an embodiment of 


the values which direct operations involving tests. 


THE PRACTICAL POWER OF TEST THEORY 


It is not too fantastic to regard test theory as exercising a control over 
its field comparable to that of an economic theory or a religious code over 
Institutions are formed by the daily transactions 
ons the participants act first of all to 
hey accept brings certain 


the institutions of a culture. 
of the culture, and in these transacti 
Satisfy their needs and values. But the code which t 
hus has a deciding influence in many choices. 

lues receive steady encourage- 


with suspicion. 


values strongly to mind and t 
h the ostensible va 


h serve other values are viewed 


eds which the code ignores can 


Institutions consistent wit! 
ment, and institutions whic 
Institutions for relieving ne emerge only by 
fighting their way against the current, as it were. 

A theory or code, setting forth values and assumptions explicitly, makes 
conformity easy. But it also displays these assumptions in full light where 
their inadequacies and inconsistencies can be seen and repaired. It is the task 
of the philosopher, economist, OT political theorist to clarify issues and expose 
fundamental faults in the cultural system. All great historical reforms have 
stemmed from such revision of value assumptions. A fundamental revision of 
test theory will affect testing practice in the same manner that a fundamental 
change in political theory affects daily life in the nation that adopts it. 


exercises its strongest 


Test theory effect by censoring new test offerings. 
er of 


A new clinical technique may have a dazzling debut, arousing all mann 
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fluttering hopes among the young clinicians who flock around her. But off in 

the corner of the ballroom are the test critics, severely observing the goings- 
on and grimly preparing their report on the young darling's character and 
accomplishments. Only recently have the chaperones been armed with an 
"official" statement of technical recommendations by which a young test should 
be judged, but these standards merely formalize what has long been conveyed 

in books on testing, in test reviews, and in training courses. This code has 
made and broken reputations. While a particular test frowned on by the review- 
ers may attain great popularity and Prosper for a time in a demi-mondaine 
existence, such Popularity is transient. A stable place in society is awarded 
only to those tests which have the critics! sanction. 

To speak of social and Philosophical aspects of test theory, and to question 


the objectivity of test criticism, must astonish those who regard test theory 


as but a special type of mathematics. 
impeccable 


Mathematics we esteem as eternal and 
, one of the few embodiments of universal logic. 
tics rests on postulates, and while the deductions ha 
tive character the Postulates do not. 


But all mathema- 
ve a universal and objec- 
Postulates are only a formal description 


of a situation, and in an applied problem they describe the situation as per- 


ceived by a particular observer. If we can find a satisfactory description 
which differs from the traditional one, mathematics permits us to erect a new 
theory contradicting the old at many points. 

A new test theory will have numerous 
example, ultimately offer a different for: 


which items should be retained in a test 


practical consequences. It may, for 
mula or procedure for determining 
+ It will certainly modify the evalua- 


tion of particular testing instruments. But what is of far greater significance 


is that it may channel testing effort in 


new directions, and open new areas of 
psychology to cultivation, 


Traditional theory has encouraged those types of 
tests which meet the traditional criteria, and as a result the possibilities of 
general intelligence tests and of achieve: 


high degree, 
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DE 
CISION THEORY COMPARED TO ALTERNATIVE MODELS 


cr eat oe of the Eo testing movement, the psychologist and 

PEENE e se one partiemiex description of the testing process. 

ee omen i of this theory are Kelley's Statistical Method (46) 
eory of Mental Tests (38). These volumes offer a whole 


serie i 
s of techniques of test construction, form 
and the i 

orems regarding the value of tests singly or in combination. Improve- 
t fifty years have been victories over 


ulas for combining test scores, 


‘om in test technique during the pas 
efec i 

ts exposed by this theory ~~ over subjectivity in scoring, over inadequate 
sampling of an ability domain, over €: 


T $ 
hese important advances demonstra! 
s at the expense of others, 


xcessive overlap between tests, etc. 
te the merit of this theory. Since any 
however, it may 


thi i 
eory must emphasize some value 
tive 


not gi E 
ise give the appropriate answers to all testing problems, and our alterna’ 
e A 
ory has drawn attention to several such inadequacies. 
T ii i 
he traditional theory views the test as a measuring instrument intended 


to assi 

sign accurate numerical values to some quant: 
as the prime value, 
urveying and astronomy, 


; itative attribute of the indi- 
vidual. It therefore stresses, precision of measurement 

The roots of this theory lie in s 
aim. 
rential psychology» 
instruments are 


and estimation. 


Where quantitative determinations are the chief 


the biometrics which was the forerunner of diffe 
ogical research, 


in i A 
creasing degree in contemporary psychol! 
In pure science it is reasonable to 


In physical science, in 
and to an 


u : ; Pa 5 
sed for estimating quantitative variables. 
ortional to its ability to reduce 


regard the value of a measurement as prop 
ome quantity. The mean square error is 
One cannot contend that on 
s or determining 


un i 
certainty about the true value of s 
e error is more 


a n _ 
useful index of measuring power. 


seri i 
rious than another of equal magn? 
y is uno! 


tude when locating star: 
pjectionable when applied to such 


melting points; measurement theor 
te is not the real desi- 
tments must be made. 

and accuracy of 


appropriate situations. 
a quantitative estima 


r more discrete trea 
to the proper category, 
t aids in this qualitative decision. 


g is not unique to psychology: 
trol an industrial pro“ 
theory for its 


In practical testing, however» 


deratum. A choice between two ° 
The tester is to allocate each persom 
measurement is valuable only insofar as i 
This view of testing as an aid 
When a physical measuring instrument is use’ 
cess, decision theory is more satisfactory than me 


to decision makin; 
d to coni 
asurement 


rticular measur- 


evaluation. 
One speaks, 


Attention in traditional 


s centered on a pa 
persons. 


test theory ha 
me manner to all 
When testing is 
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urement of a test. 


ing i gig 
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r error of meas $ 

ifferent information 


therefore, of the validity © 


s : 5 F 
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may be required for different individuals. The conventional questions about 
"a test" do not apply to a sequential procedure. Since we cannot speak of the 


validity of a test which differs for every person, we must speak of the efficiency 
of the entire decision-making procedure. 


The unsuitability of measurement theory for practical psychological testing 


was hinted at as long ago as 1928, when Hull (43) noted inconsistencies between 


measurement theory and the logic of selection problems. This discrepancy 


has repeatedly returned to attention, notably in the writings of Taylor and 


Russell and Brogden. They considered the selection problem as a choice be- 


tween two alternative treatments, and arrived at conclusions inconsistent with 


the classical theory. Although these papers dealt with selection as an isolated 


problem, we may see it as one special type of personnel decision. 
challenging widely held beliefs as they did, may be regarded as the first ripples 


in a cascade of developments that follow when we consider the test in relation 
to the decision for which it is used. 

Our formulation does not discard measurement theory. On the contrary, we 
have specified more accurately where it applies rigorously 


and where it may 
serve as an approximation to the ideal solution. 


But in every case the nature 


of the decision problem must be specified and these specifications must be used 


to determine the appropriate mathematical model. The most important para- 


meters involved are the payoff functions for alterna 


tive treatments and the con- 
straints upon decisions (e.g 


+, the selection ratio). 


These parameters, together 
with the parameters of the tests 


(validity, reliability, intercorrelation, and 


A meter stick or a thermometer is used ina 


Particular experiment to obtain a number. Over the life of the instrument, all 


parts of the scale will be used 
processes, 


These papers, 


Ne a hh eee 
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is expected to rank persons from best to poorest, and error distorts the rank- 
s "unfair" to the individuals who are ranked lower 
But from a 


ing. Since such distortion i 


than they deserve, testers want to reduce error of measurement. 


utilitarian point of view, these errors can be ignored unless they alter the 


goodness of whatever decisions are to be made. An attribute which is, in the 


abstract, deserving of reward has no bearing on allocation to treatments if it 


does not predict differential payoff. 


A test designed to be maximally efficient for a particular d 
sion. The test 


cision will 


freely allow errors to enter if they are irrelevant to that deci: 


designed on the basis of pure measurement theory devotes testing effort to 


irrelevant information, and thus is not as efficient. We would expect, for 


test battery developed by Horst's extension of measurement 


ptimally efficient for classification decisions, since the 
To date the only 


example, that a 


theory would not be o 


theory gives no consideration to quotas and payoff functions. 


sts for a specific decision problem is the work on "peaked" 


attempt to develop te 
tests for selection decisions. 
of ability, it has been proposed to restrict test 
Lord (50) has recently summarized this work; the tests so desi 


a bit more efficie 


For selecting persons above a particular level 
items to one level of difficulty. 
gned are indeed 


nt than those designed in the classical manner, but the dif- 
ference is trivial in amount. Sequential procedures also sacrifice accuracy at 


ave greater accuracy where it most affects decisions; 


some places in order to h 
when costs of testing are appr 


eciable the sequential method has clear advantages. 


elis not the only possible starting point for 


Just as the measurement modi 
ace 


test theory, so the decision model is not the only alternative that might repl 


it. One prominent and rather well developed alternative is the discriminant 
function. The discriminant model recognizes that discrete alternatives may 
s basic task as the allocation of each individ- 


It differs from our approach 
"belong" in a 
suitably 


confront the tester, regarding hi 
ual to the proper one of several 
chiefly in the way payoffs are spe 


particular treatment, and the evalua 


categories. 
cified. Each person is said to 
tion of a strategy consists of some 


oneous assignments. Sometimes this 


weighted function of the number of err 
although it is then possible to 


‘ormulating a problem will be ideal, 


way of f 
s a special case within decision th 


regard ita eory. Another evaluation is, we 
believe, more often suitable in personnel clas 


hin a treatment as varying continuou: 
This avoids any assumption that per- 


sification. We may consider 
payoff wit! sly with score, and allocate 
o as to maximize total payoff. 


people s 

sons fall into homogeneous types, OF that payoff is the same for all "correct" 

decisions. To put it differently, the discriminant function is concerned entirely 
nto 


with between-groups variance in test score, whereas our model takes 1 


fact that predictable variance in 


ering and assignment strategi 


payoff ithin groups dictates 


account the 
es. 
ent theory stresses the fact 


Ferguson, Guttman, 


information-gath 
ative to traditional measurem! 


that test data are qualitative rather than truly qu 
ers view the test as an instrum 


Another altern: 
antitative. 


Loevinger, and oth ent for locating persons in 
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categories. This theory recognizes that the overall validity coefficient of a 
test is a less important quality than its power to discriminate at the boundary 
lines where decisions are to be made. Each author has Proposed an index of 
the goodness of a test as a categorizing instrument, and these indices could 

be ysed in place of the error of estimate or the validity coefficient. Test con- 
struction could point toward maximization of such indices. Again, the decision 
model provides a vantage point for inspecting the proposal. In a discussion of 
the Ferguson-Thurlow approach reported elsewhere (23), we show that this 
method of describing the testing process has two defects. 
of discrimination as equally serious 


It treats all errors 


» even when the order of categories makes 


Some errors far more serious than others. Secondly, it compares the deci- 


sions made by the test to chance decisions 


» instead of to the best a priori stra- 
tegy. Since the decision model permits us 


to use the strategy and payoff 


obtain approximate answers to his questions 
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By general ability tests, we think particularly of measures of scholastic 
or intellectual ability of the type widely used in predicting school and job per- 
formance. Such tests are general because the attribute they measure is related 
to success in a great number of undertakings. 

In the prevalent theory, the validity coefficient is the cachet of test nobility. 
The coefficient of determination looks scornfully on any test which cannot 
account for twenty per cent or more of criterion variance, and the coefficient 
of forecasting efficiency sets even more haughty standards. The general abil- 
ity test is virtually the only test whose coefficients consistently come within 
the range from .40 to .80, and this fact has made it the most universal of pro- 
cedures for psychological assessment. No new theory alters the empirical 
fact that this type of instrument has high technical quality; does decision 
theory, however, modify the way the instrument is judged? 

For some tests, perhaps, taking into account the importance of a decision 
as we do by means of a, would cause us to lower our estimate of the value 
of the test. But the criteria on which the general ability test bears are often 
of the highest importance, and indeed are closely related to the person's suc- 
cess and adjustment in his entire life. Furthermore, decision theory argues 
that a test which bears on many decisions makes far more contribution than 
the test which bears on only one, and this is where the general test excels. A 


single IQ for a high school student tells teachers of a dozen subjects whether 


or not to press for higher accomplishment, tells what occupational range the 


ard, helps in decisions about college education, 


student may properly aim tow: 
point of view, the traditional theory seems 


and so on without end. From this 


if anything too restrained in its enthusiastic appraisal of the general ability 


test. 
Other aspects of decision theory, however, temper this enthusiasm. 


eviously summarized (pp. 74-77) that utility 
dity, and that the coefficient of forecasting 
While this result does alter the way 


We 


need not dwell on the finding pr 
is most often proportional to vali 


efficiency sets indefensible standards. 
.40 as compared to one of validity .20 or .30, the 


quences of our theory. 
red to the best a priori 
If valid decisions can 


be made using information already at hand or cheaply obtainable, the test 
The zero-order validity 


we judge a test of validity 
change is rather minute as compared to other conse 
The argument that a testing strategy must be compa 


strategy applies with particuiar force to general tests. 


should be judged by the increase in validity it offers. 
without the test, the decision maker would be 


coefficient is relevant only if, 
al test deals with quali- 


Just because the gener, 


forced into chance decisions. 
rmation on those qualities 


ties required in a wide range of performances, info 
We are told by Professor P. B: 


can be obtained from past performances. 
been predicted from 


in England success in grammar school has 
rimary-school ratings with validity coeff. 
In a typical study, the bes 
than .05. A validity coefficient of .82 


Vernon that 
icients as high as 


properly scaled p 
.80 (for the total range of talent). t test battery 


raised the multiple correlation by less 
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for a general ability test makes it appear spectacularly valuable, but its true 
contribution to decision making is scarcely a five per cent improvement over 
non-test decisions. In view of the importance of the decision, even this gain 
very likely repays the effort of testing. (But see our further discussion below.) 
When the estimated value of the test is deflated to proper proportions, however, 
one immediately wonders whether tests of rather low validity, 


sions where substitute data are not available 


bearing on deci- 
» might have greater social impor- 


tance. It is only speculative to suggest that such importance might be found in 


predictors of delinquency, emotional disorders, or vocational interests; for a 


test to be valuable as a screening device, it must not only predict with some 


accuracy, but there must be beneficial treatments available for those singled 


out by the test. 


In academic decisions, treatments in which payoff is closely related to test 


Score are certainly available. Yet that very success of the general ability test 


as a predictor may imply that the tests are not right for the uses to which they 
are put. This question was first raised by the realization that in counseling of 
college students the most common problem is a 
vocations. 


» differential tests of abilities and interests have been sought which 


would predict between-treatment differences. The same argument has been 


applied to military classification, with a result; 


ant increase in use of differen- 
tial predictors. 


Decision theory has formalized the 


Problem of payoff in differential deci- 
sions, and extended it in two ways. 


First, it has been shown that in classifi- 
cation the value of measuring a dimensio: 


n depends in a complex manner on 
the payoff functions relating outcome to the various dimensions and on the 
These relations have to date been studied only 


tend to devalue the general test, 


quotas for the treatments (13) 
superficially, but the results 


: Second, a pre- 
viously unrecognized type of namely the place- 

, 
ment decision. of classification deci- 


re-continuum is involved, 
the factors i ra 


grouped on the basis of 


st, and each group receives a different treat- 


Now for eaci treatment there 


r í ae is a relation between payoff and score, 
which for simplicity we assume to be linear, The slope is determined by the 
variation of payoff To and the validity coeffici 


ent ye for that treatment. The 


ment. 


contribution of the test to a Placement decision depends on the differences 
between slopes for the several treatments, and therefore the value of a test 
for sectioning students will be zero if the 


slope of the payoff function is the 
same for all treatments. 
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The correlation of .85, more or less, between prediction and grammar- 
significant than it seems at first glance, 
surely be found also in the “modern 
ees go. From the viewpoint of 
The present scheme does 


school success in England is far less 
because a very large correlation would 
secondary" school to which most of the reject 


national policy, the aim is to maximize total output. 


this only if the slope T Fre is greater for grammar schools than for modern 


The facts to determine this are not available, simply be- 


secondary schools. 
tive validity alone is required. 


cause it has previously been thought that predic 


Very likely, o, for an unselected range of talent is greater in grammar schools 


than in modern secondary schools, and if so the allocation procedure would be 
.85 suggests. Moreover, better 
payoff were held in mind as 


eatments. There must 


profitable. But the profit is less than the r of 


bases for decision could be invented if differential 


the goal. The g 
be attributes (methods of problem solving? 
s? interests?) which are more relevant to 


eneral ability predicts success in both tr 
preference for abstract thought? 


one mode of instruc- 


The slopes of payoff functions for these qualities will be 
n slope from treatment 


character trait 


tion than the other. 
for the general test, but the difference i: 


d the benefit from testing will be greater. 

dary school allocation has afforded a useful example of a place- 
Nearly all the everyday uses of ability and 
er than selection 


less steep than 
to treatment an 
British secon 


ment problem, but it is not unique. 
rican schools are for placement rath 


And the observation which 
ne the 


achievement tests in Ame 
ply there as well. 


decisions; our observations ap) 
at the basic data required to determi 


applies with greatest force is th: 


worth of placement tests have nev 


A decision model has forced us to formulate one more concept whic. 
atment. 


er been obtained. 
h bears 


on the reputation of ability tests, and that is the notion of adaptive tre: 
American testers have placed themselves at the disposal of institutions wish- 
ed treatments, and have pointed with 


tion. But 


ing to assign men to predetermin pride to 


the fact that they often can raise tl 
ual gains can often be attai 


he average output by careful selec 


nearly eq ned by another branch of personnel tech- 
nology which refuses to accept the treatment as given. The job-simplification 
expert and the human engineer seek to fit the job to unselected men. The greater 
The tester has failed to realize 


tg method is 
le the 


their success the less the value of selection. 


that he is competing with the treatment simp 
e economical, for his changes ma 


lifier. And the latter 


often the mor y be made permanent whi 
aluate new employees forever. 


blem is to find the optimum c 
This calls, no! 


tester must ev 
of treat- 
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The true prol 
t for rivalry between en 


n of persons. 


ment and selectio 
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ration. 
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one can atta 
y of treatments which v: 
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tors) by an infinite variet 
atment, and for every treat- 
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Only by a study of the 


dimensions. For every P' 
payotf surface linking treat- 


ment a best type of person- 
rive at the best combination. 


ment dimensions and payoff dimensions 
tion of the experimental 


can one ar 
methods of the job designer 


This calls for a combina 


128 PSYCHOLOGICAL TESTS AND PERSONNEL DECISIONS 


and the differential-correlational methods of the tester. The vast problem 
thus posed is impossible to cast in concrete terms at present, but if we look 


at just one aptitude at a time, we can make some progress. 


For any one type of score such as general ability we have a family of straight 
lines relating score to payoff for various treatments which in its simplest form 
is characterized by its envelope, a second-degree curve (Figure 9). And the 
greater the curvature of this line, the greater the value of adaptation. If the 


curvature is great, adaptation to unselected men gains much of the value testers 


have been claiming for selection. Adaptation may be accomplished by routiniza- 


tion, reduction of the verbal components of training, and other means. Even 


in schools and universities, elimination of intellectual hurdles which are 


irrelevant to the person's later performance could reduce the importance of 


selection on general intellectual characteristics. There are practical limits 


to modification of treatment, but these limits are astonishingly remote. 


Putting all the foregoing argument together, we can neither exalt nor debase 
the general ability test. Considering the importance of decisions and the com- 


pound uses of general information introduces favorable factors. The best a 
priori strategy, with or without adaptation, is often much better than a chance 
decision, and neglect of this has caused the test to be ov 


errated. For place- 
ment decisions, 


the test has surely received much too much credit; it may 
even have been used where its true value is negligible. 
This analysis makes clear that the question whe 


ther a given test is profit- 
able or unprofitable is meaningless. 


The value of a test is great for one deci- 
sion and small even for other decisio; 


eralized study of payoff functions for 
we now know regarding its usefulness 


ns involving the same treatment. A gen- 
a type of test can tell us far more than 


But it has no universal value. Its con- 


THE VALUE OF WIDE -BAND PROCEDURES 


Of all the implications of decision theory, Perhaps the most dramatic i 
atic is 


Procedures, 


the new interpretation it offers for wideband Teo Ü a 
> ? unseling an 
guidance, one must ordinarily help the person answer sey, d 


5 eral questions at 
once, and each answer involves somewhat different types 


of information. The 


counselor may obtain information narrowly focussed on on 


i e dimension, or may 
use a procedure which covers many areas. 


l 3 i Bandwidth, ie., greater coverage, 
is purchased at the price of lowered fidelity or dependability For any deci 

z š y deci- 
This conclusion departs from 


always desirable to maximize 


sion problem there is an optimum bandwidth. 
conventional theory, which assumes that it is 
dependability. 

Among the important wideband procedures are the interview 


a the projective 
technique, the essay examination, and analysis of Patterns of Successes a 
an 
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failures on ability tests. Each of the wideband devices is unsatisfactory, by 
the usual standards of predictive efficiency and reliability. Our work suggests, 
however, that the negative evidence may not bear on the usefulness of the pro- 
cedures for the function they best fulfill. 

Let us begin with the interview, a technique which is universally accepted 
by those who make personnel decisions. No matter how elaborate a testing 
program may be, it is supplemented by an interview wherever the cost of 
interviewing permits. Indeed, some of the very psychologists who report neg- 
ligible validity for interview judgments themselves insist on interviewing the 
prospects when they have a vacancy to fill in their own offices. The evidence, 
however, is preponderantly negative. Interviewers' judgments disagree with 
each other and with objective criteria. Why, in the face of such evidence, does 
the interview retain its popularity? 

If the interview were omitted, estimates of intelligence and other qualities 
which can be judged by objective means would be more valid than when conta- 
minated by interviewers! impressions. By omitting the interview, however, the 
employer would give up the possibility of obtaining information on characteris- 
tics which formal measurement procedures do not reveal. An interview per~ 
mits the decision maker to cover the length and breadth of the subject's his- 
tory and character. "How did you get along with your last employer? What 
aspects of your work interested you most? Tell me about your family... ."' 
In this varied conversation, significant facts can come to light which no struc- 
tured test or questionnaire could reveal. A reluctance to talk about a particu- 
lar job experience perhaps discloses a failure. Other remarks hint at a com- 
petitive attitude which would be an aid in one job and a hazard in another. The 
personality traits which receive chief attention as one interview develops may 
be brushed over in an interview with another person. The virtue of the inter- 
view is that it can turn in any of hundreds of directions, following leads in a 
way that the structured narrowband procedure cannot. 

Quite similar comments could be made regarding the projective technique, 
which touches on abilities, interests, social relations, sexual attitudes, methods 
of thinking, and so on through hundreds of characteristics. Any given record 
says next-to-nothing about some of these areas, but each record presents a 
few striking individual features of undoubted importance to the clinician. The 
essay test has a smaller range. As a measure of factual knowledge a history 
paper compares more or less unfavorably with an objective test in the same 
area. The essay test, however, gives information on spelling, organization of 
ideas, manner of attack on a problem, imagination and originality, special 
biases, and so on. The teacher scoring such a paper has a better chance to 
"get to know" the student than does the teacher scoring an objective test. 

The wideband procedure sheds light on many different decisions. The 
employment interview does not bear only on whether to hire the person. It 
suggests how to assign him, how to supervise him, what weaknesses must be 


allowed for, what are his prospects for promotion, and so on. The projective 


= 
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test helps the clinician to decide regarding dozens of steps of treatment. And 
the essay test may provide numerous Suggestions about educational procedures. 


One may contrast two different aspects of any information source: exhaus- 


tiveness and dependability. Exhaustiveness is the extent to which the informa- 


tion offered covers the ground we wish to know about. 


Dependability is the 
extent to which the information offered is true. 


Only a precise evaluation of a 
given procedure can indicate whether it stri 


kes the proper balance between 
the two desirable qualities. 


Though any one fact or judgment from a wideband 
procedure is undependable, the Procedure as a whole contributes more in some 


owband procedure, 
haps improve every decision that is to be mad 


circumstances than a narr The wideband procedure can per- 


e. The narrowband procedure 


ing them to be made on a priori 
grounds. 


Even where a wideband techni: 


que does contribute more than a precise nar- 
rowband instrument, the decision 


maker 
Usually it is practi 
the first stage ina sequential process 


minal decisions. A sequential proces 


them only to the extent that they dese 


should be reluctant to rely on untrust- 
cal to make the wideband instrument 
» arriving at reversible rather thah ter- 


s makes ideal use of fallible data, trusting 
rve. 


worthy information. 


Actually, in most present practice, 
her than for final decisions. 


POBILI of UneStying siiotionsll dnseoue= 
ity which can be investigated by inti 


a personality test, or 
information from acquaintances. 


© salient observation is 


rt courses. The next 


des by questioning his 
instructors, and perhaps to apply tests of artistic talent, 


In counseling of a 
student referred for low grades, the interview would Suggest whether the cri- 
tical questions relate to lack of interest in his curric 


ulum, poor study methods, 


an undesirable attitude, poor reading, roblem 


an emoti 3 
ors TAP » OF something 
else. Dependable measuring instruments exist for s 


re €veral of these areas, and 
would form the next stage in fact-finding. 


similar detail the sequential nature of employment interviewing, diair — 
nosis, or educational diagnosis. 


In the ideal situation, the decision maker comes to no co: 


nclusion on the 
basis of the wideband procedure. It directs his subsequent 


Observations » and 
perhaps suggests a tentative treatment which will enhance 


his OPportunity to 


observe what is pertinent. Ultimately, he hopes to obtain enough data for a 


highly dependable terminal decision. 
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The merit of the broad survey as a first step in decision making is obvious. 
In a hundred counseling interviews attention is directed to a hundred different 
questions, each of which appears salient in only a fraction of the cases. All 
this ground could not possibly be covered by systematic, precise techniques. 
The survey is fallible; it misses some points, and opens up false leads. But 
the subsequent questioning reveals the false leads as false, and there are fewer 
missed points than there would be if a narrowband procedure had been used. 

The wideband procedures, though condemned on the basis of validation 
research, have won a large and loyal following who claim to have had satisfac- 
tory experience with the methods. We submit that the enthusiasm of their 
users is the result of satisfactory experience with them in their sequential 
application. The critic, on the other hand, has most often applied standards 
appropriate for evaluating terminal decisions. The interview gives an unde- 
pendable impression of intelligence. Judgments of aggression based on the 
Rorschach disagree too often with observed social behavior. Psychiatrists 
interviewing the same person frequently arrive at quite different estimates of 
emotional stability. Qualitative analysis of Binet or Wechsler protocols is a 
poor way to measure flexibility or creativeness. The evidence is that estimates 
from wideband techniques are rather unreliable and have low validity. Used 
sequentially, however, they draw attention to facts which would otherwise be 
missed. The price of pursuing false leads is one which the decision maker is 
often willing to pay. 

In recent years controversy has centered on projective methods, which 
have been an active subject of research. The proponents of these methods 
have tended to reject the negative evidence, claiming that the techniques per- 
mit highly valid judgment. There is no objective support for the claim that 
projective data are a valid basis for terminal decisions. Supporters of pro- 
jective tests have seemingly rested their case on the wrong grounds. We sug- 
gest that they have argued for the conventional validity of their tests because 
that has been the only passport to respectability in test theory. Decision theory 
points out the existence of quite another virtue in tests and the available evi- 
dence is not inconsistent with the claim that projectives have this virtue. 

We believe that proponents and critics of projective tests could agree on 
the following statements: Inferences from projective tests are often wrong, 
but right more often than chance would allow. Projective techniques frequently 
suggest hypotheses about the individual which, when they are confirmed, have 
great practical importance. Many of the suggestions and hypotheses about idio- 
syncratic characteristics of the individual are subsequently proved baseless. 
The theory and technique of projective testing and interpretation can be greatly 
improved. 

Reconciliation of the objective evidence with the clinician's enthusiasm is 
made possible by the fact that the evidence disproves claims which should 
never have been made, but supports claims consistent with much clinical prac- 
tice. Our argument should be compared with Meehl's discussion (53) of the 


value of clinical judgment in diagnosis. His conclusion that clinical methods 
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are chiefly useful for constructing hypotheses to be confirmed by 


further inves- 
tigation is strikingly similar to ours. 


Placing wideband techniques in a logical framework encouraging to their 


advocates does not absolve them from all criticism. There is some risk that 


our clarification of logical Possibilities will be taken as an endorsement of 


present practices. Nothing could be farther from our intentions. Projective 


and other wideband methods are used in indefensible ways which should be 
criticized and corrected. We hope that our theoretical analysis directs criti- 


cism toward the essential points and away from false issues. 

Recognizing the much greater value of wideband 
sequentially will, we hope, discoura, 
decisions, 


techniques when used 


ge the use of these methods for terminal 


Research on these methods should consider their usefulness as a 
way of focussing investigation. 


leads suggested by a given wide 
along with the importance of th 


only legitimate basis for evalu. 


It is necessary to determine how often the 


band procedure are fruitful. Considering this 


e findings and the cost of the procedure is the 
ating the method. 


gests that it would be profitable to 


T, for example, better information 


uld presumably be obtained by increasing’the 
proportion of plates involving parent figures, 


In many diagnostic problems 
the attendant sacrifice of information about other aspects of the person would 
surely be justified. 


In the Ta’ 


The true issue is not whether wideband methods are good or bad. 


The ques- 
is the best information-g 


tion is, for any given decision Problem what athering 
procedure? If a projective technique would be useful, just how should it be 
designed and interpreted to give the greatest help? Nothing in our argument 
leads us to think that any procedure can conceivably give an accurate analysis 
of "the whole personality". Instead, it is ne 


SHERRY ito die Hl foo nn a limited 
quantity of information the most intelligent Possible decision, 


3 The problem is 
ilable, ofi 


i e which, in the time 
"onena penzed as ee fers the greatest yield of 
important, relevant, and interpretable information, 


A FINAL WORD 


Our search for a coherent theory of decision-making on the basis Sh teat 
information has led in many directions. Wherever we turn, we find ane Ea 
questions. Research of many types is needed, ranging from 


s í simple fact finding 
to major inventions. Mathematical studies of test design and allocation Pees 


tegies, new types of empirical studies of payoff surfaces ang of validity, ang 
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even philosophical studies of evaluation problems have a part to play in extend- 
ing test theory. Along with these theoretical developments we perceive many 
possibilities of replacing present measuring instruments with new ones: place- 
ment tests of superior differential validity, personality surveys with appro- 
priate bandwidth for particular tasks, collections of brief tests for sequential 
classification, and so on. Our final conclusion may be optimistic or pessimis- 
tic, but it is inescapable. The test theory developed to date covers only a 
small corner of the domain within which the decision maker operates, and in 
only a few specialized instances have we discovered the testing practices 


which will most increase the utility of personnel decisions. 


appendix T 


UTILITY IN FIXED-TREATMENT SELECTION AND PLACEMENT 


Definition of the selection problem 


The following development formally defines the selection problem where 
information is univariate and the treatment to be given accepted men is speci- 


fied independently of the test. The argument indicating the relation of utility 
to test validity is a restatement of that presented by Bro; 


gden (10, 12), and 
» 


Cochran (20). We make the following assumptions: 

1, Decisions are made regarding an indefinitely large population of 
persons. This "a priori population" consists of all applicants after 
screening by any procedure which is presently in use and will con- 
tinue to be used. 

2 


+ Regarding any person, i, there are two 


possible alternative decisions: 
accept ta) and reject (t 


B} 


dard deviation. 


4. For every person there is a payoff €i, Which results when the per- 


The test will be scored so that r e İS positive. 
+ When a person is rejected, the payoff ei, results. This payoff is 
B 


unrelated to test Score, and may be set equal to zero. 


+ The average cost of testing a person on test yisc 
7. The strategy will be to accept high scorin 
others. A cutoff y! 


» whi C, >o 
y? Where y 
& men in preference to 
will be located on the 


y continuum so that any 
desired proportion oy!) 


of the group falls above y', 


Above that point 
probability of acceptance is 1.00; below it, zero. Cochran shows that 
sucha strategy is optimal for selection with fixed quota. 


From assumption (4), the expected payoff from accepting a man is 


Ge Serye (1.1) 
yt, eye ot, 
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Here, Co and Tye are computed on the a priori population, and Sot, is the pay- 


off expected when an average man in that population is assigned to treatment 


the 


Utility as a function of validity in selection 


The expected utility of a set of decisions is obtained by summing eye the 
expected payoff, over all persons and subtracting cost of testing. Let U rep- 
resent the total utility resulting from decisions about N persons, cost of test- 
ing being taken into account. 


N 
U = 5 (p, e +P e = G.) (1.2) 
ci Ptaa ” Pte vite Y 


Using assumption (5) and (1.1), 
N 


= 2 te - NC 1.3 
U B Pe /y eyi ot,) m (1.3) 


If individuals in any y array are accepted, their total utility is 


U_ = Npp. (ory te, ) - NC (1.4) 
ty/¥; ot y 


© s i 
From assumption (7), [ Bay = No(y'). Therefore, summing over all 
values of y, ¥ 
ie ' 1.5) 
U = Nour. | pyr dy + Noy ee. NC, (L 
Jy: 
The a priori utility Uz is the utility resulting when N ¢(y') men are selected 
from the population by chance: 


Uy = NOD eot, (1.8) 


The gain in utility from test y is 


(1.7) 


It will be noted that, for any a priori distribution, gain in utility is a linear 


function of r__. We shall be chiefly concerned with the case where we can 


assume a normal distribution of y. We shall let E(y) and (y) represent the S 
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ordinate and area (upper tail) corresponding to any value of y under this assump- 
tion. Then (1.7) becomes 


AU = No ry Ey!) - Ne, (1.8) 


which is the net gain from testing Nmen. Hereafter, 


we shall divide (1.8) by 
Nand let AU represent the gain in utility 


per man tested. 

As we proceed to study a variety of types of decision, we shall employ 
va co general symbol for the contribution ofa 
fied decision d, apart from cost of testing. 
B 


3 particular test y to a speci- 


In fixed-treatment selection, 


= Tys so that the gain in utility per man tested 


yd 


AU = - 4 
A Ba Cy (1.9) 


Rewriting (1.8) 
ments. 


The gain in utility per man accepted is 


iy anc M 
Te ye aly we yj oF Tete Pa 


in this form will simplify several later mathematical develop- 


(1.10) 
since the average test score y for accepted men equals E(y")/d(y") 
( 5 

Region where test is admissible. From (1.8) it is evident that AU > 0 
for any y' such that 
A C 
Sy) > ye (1.11) 
e ye 


For any positive value of C/o £ there are two ya 
located about y = 0, beyond which selection with 
than random selection. Outside these limiting values, use of the test is an 
inadmissible strategy, in Wald's sense, 


Optimum strategy. The maximum utility per person accepted is obtained 
by differentiating (1.10) with respect to oy"): 


Y= Hy) . SC 
Tery GT + ah = 0 


lues of y!, Symmetrically 


the test is less beneficial 


(1.12) 
' 1 £ 
Hy") = y'd(y!) = Te (1.13) 


Only one value of y! satisfies this equation. Equation (1.13) i, meaningful only 


R i is otherwise not i 
when {y') > G/T? since testing is Profitable, Therefore 
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y'¢(y) must be positive, and y' must be greater than zero. 

When there is a fixed numerical quota N ¢(y'), setting the optimum y' 
according to (1.13) determines the optimum number of persons to be tested. 
For maximum utility with a fixed quota, enough men should be tested so that 


the cutting point is above the mean. 


Utility in placement with fixed quotas 


The above relationship may be generalized to consider placement decisions 
where any number of fixed treatments are to be used. 

For each of the n treatments there is a linear payoff function, whose para- 
meters will vary from treatment to treatment. The y continuum is to be di- 
vided into n segments, each containing some predetermined proportion of the 
cases. Each segment is bounded by two cutting scores, vt and yy the latter 
being the upper boundary. y}! is also, of course, the lower boundary for the 


next treatment, t+l. o(yp = oly) is the quota for treatment t. 
If all men above YE were assigned to treatment t the utility per man tested 


would be (from (1.5), invoking the normal assumption), 

t 1 - 14 
Te ye, Ey!) + ADE = Sy (1.14) 
A similar expression may be written in terms of Yt The utility when men 


between Yt and y are so assigned is, by subtraction, 


UL = Te ye | Hye) - Ep + eg lo) - oN - Sy (1-15) 
We may write A E and A¢, for the bracketed terms, so that 
U, = Fe tye, st F enhe = Cy (ase) 
Over all treatments, 
U = R TeTye, tt + 2 eot foe = cy (1.17) 


The a priori strategy with fixed treatment is to assign randomly selected 
men to each treatment to fill the quota. From (1.6), for each treatment 


Ung Se phe (1.18) 


ot 
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Subtracting, 


ANT! = A Š 1.19) 
AU = 2 Feye, ^ ro ( 


This again may be written as B a7 Cy. Boa thus being defined in fixed- 


y 
treatment placement by the first term of (1-19). 


Utility in placement with adaptive quotas 


If treatments are fixed and arranged according to the slopes of their pay- 
off functions, but quotas are not specified 


» İt is possible to increase utility by 
altering the quotas. 


with respect to Yp and using t-l and t 
h that yi!) = yi, 


Maximizing (1.17) 
to indicate two adjacent treatments suc 


ðU _ 1 1 by 
ayp T Fey Tye, ED + eg yi) 
: Te Tye tty ~ otl) = 0 poah 
Simplifying, 
C r ' = 1.21} 
Sen Yepp t + ĉo(t-1) Te yet + est ( 
or 
e 2 
o(t-1 t 
yee ie a; (1.22) 
t Yes 


intersect. If such computations for any treatment yield a value 


that treatment drops out of the series of treatments used 


of yy < Yp 
Its quota is zero. 


+ best a priori stra- 
tegy is to assign all men to whichever o 


f owed treatments gives great- 
est utility when y = 0. We will designate it t 


g Considering all treatments, 
if adaptive quotas are used, 


AU = Zo r Ag, + Ble =e re 
E wap i gat ot )A% G (1.23) 
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A PAYOFF SURFACE DEPENDING ON A UNIVARIATE APTITUDE 


To consider problems involving choice among many treatments, we develop 


a general function relating payoff to a univariate aptitude. Parameters are 


employed such that this surface will fit a variety of actual data. 


It was pointed out in Chapter II that when many scores are available it is 


= hi 


possible to perform a factor analysis, deriving the orthogonal factors By rP 


Of these factors, the ones which determine payoff under any of the 


s 
zeo 
In this appendix we restrict 


treatments are referred to as aptitude factors. 


ourselves to the case where all tests under consideration measure only one 


ure common factors not 


aptitude dimension s (although they may also meas 
unit standard deviation, 


related to payoff). We assume that s has zero mean, 


and normal distribution. 


We assume that the expected payoff under any treatment is a linear func- 


tion of s: 
C2 A E Ken E a (2.1) 
st e, Seg ot 
Our problem is to examine the possible changes in payoff when treatments 
are allowed to vary. As discussed in Chapter Ill, treatments may vary along 


r or both may 
Si 
may vary. Change in either of the first two para: 


many dimensions. As the treatment varies, either o, or 
t 


meters affects 


vary, and eot 


the slope of the payoff function, and we therefore introduce a s 
If two treatments have the same slope, it will always be pre- 


lope parameter 


mi,=o (Fr 
st 
e, sey 


ferable to assign men to the treatment for which e p is greater. 


slope mp then, only one of the set of treatments need be considered. 


For each 


We desire to postulate a relatively simple form for the payoff surface and 


therefore assume that the intercepts € p vary in the following manner: 


eg =e + bma T am2,, where a > 0 (2.2) 
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This assumption is adopted because it is the simplest function which defines 
a non-trivial case. While bis likely to be greater than zero, this is not re- 
quired. Substituting in (2.1), 


we have the equation for the surface relating 
expected payoff to aptitude 8: 


= a 2 2.3 
est MAA t bm,, amit (2.3) 


y is available. 
S, there isa 


treatment t; which yields the greatest 
payoff. Maximizing £ with respect to m 


st’ 
ðe _ 
Garey =e: ub 2am,, =0 (2.4) 
= St+b 
Date = Zar (2.5) 
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UTILITY WITH ADAPTIVE TREATMENT 


It will be advantageous to determine the utility from placement with adap- 


tive treatment before considering adaptive selection. The opposite order is 


employed in describing the findings in the text. 


Placement with fixed quotas 


It is assumed that a large number of treatments are available, and that a 
The aptitude s is common to test and 


particular test is under consideration. 
The decision maker 


treatments, and the payoff function is described by (2.3). 
is to divide persons into a certain number of groups, the proportion to be placed 


in each group being specified. Each group will receive a different treatment, 
d to select the treatment best fitted to the esti- 


the decision maker being allowe 
utility will refer to 


mated aptitude of each group. Throughout this discussion, 
utility per man tested; that is to say, N is dropped from all equations. 

If the test is not given, the best a priori strategy under these conditions is 
to divide men at random into groups, and to assign all groups to the treatment 
best when s = 0. From (2.5), this treatment t, is such that 


m = b/2a (3-1) 
o 


This is not the same t, as appeared in (1.23). If $(y') persons, randomly 
selected, are assigned to t, it follows from (2.6) that the a priori utility per 
man tested is 
Vor, = o(y')(c + b?/4a) (3.2) 
When the test y is used as a basis for dividing the group, utility is increased. 
As in Appendix 1, we assume that the cutting scores y' and y" are fixed so that 


Ad, = lyi) - oy) is the quota for treatment t. 
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Eli - € (yy) Ate 
The average test score of men between y' and y" is — Ts, or a6, g 
Asnnniag a linear regression of s on y, the average value of s in this group 
is "ys =. From (2.6), the payoff when these individuals are given the treat- 
ment optimum for them is, 
g _ Ad, 2 APE, 
st, Z Ga “ys Teg, +b)? + cao, (3-3) 


The utility is 


1 
UE gays We, * Ary AE, + b7.AG) + c Ae, - Adc, (3-4) 


Over all treatments, Zag, = 0 and > A$, = l. Therefore 
t 
x3 Ate 
U = -y5 t b? 
e tae tec, (3-5) 
From (3.2) and (3.5) the gain in utility is 
aon Aze, 
AU = YS ee! o 
4a = a9, Cy (3.6) 


AU is a function 


Bya is here equivalent 


For men assigned to any one treatment, the 


gain in utility į ne 
tracting (3.2) from (3.4). utility is found by sub 
rè Ate š 
bA 
au, = YS t sPAE, 
oes iE, ea - AbC, (3.7) 


Adaptive selection 


The expression for utility in adaptive Selection now foll 
Š < o x 
(3.7). In adaptive selection a quota to be accepted oly") i ws readily from 
n , ss es 
accepted men are assigned to the one most suitable eat Pecified, and all 
ment, 


are rejected, and their payoff is zero. The remainder 


A pri i, a randomly selected group of size ] `- ly") vna 
i eve ou. ; 
and payoff is zero. After testing, the utility is zero less the ¢ so rejected, 
ost of testing 
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the rejected persons. Therefore the gain in utility from testing depends entirely 
on the increase in utility for accepted men, less the cost of testing the rejectees. 
Modifying (3.7) to take into account the fact that yj! = œ, and subtracting 

(1 - ẹ(y')) Cys the gain in utility is 


et miy) =, bbly') 
Arse l tG (3.8) 


The first two terms of (3.8) define B yd for adaptive selection. 

To clarify certain points PR in the text, expressions are required 
for utilities which might result under several different conditions. Consider 
three treatments: ta? some treatment which happens to be in use a priori; to 
which is best suited to randomly selected men; and t., suited to men with test 
scores above y'. Uy is the utility from assigning ¢(y') randomly selected 


men to treatment t. From (2.2): 


U = o(y')(c + bm, - am? ) (3.9) 
ot, ta ta 


U has already been given in (3.2). Substituting (2.5) in (2.2) with 
E 
š 


Tys $ 


x , we have 
bè? 1 1)\2 
Vor, = sofe + ae Eley se] (3.10). 


The a posteriori utilities for men selected on e basis of test y may be denoted 


Uye The average ability for these men is Tos ae z» hence 
E(y! 
uU, = yim r iH rerin = ant} 36 (3-11) 
Yta ta ys oly ty ty y 


=, | Tyet(¥) 2 
ty = b Say yy ta tel- Sy (3-12) 
U = son y$ x + b)? + e] = (3.13) 


Fe 


Placement with adjusted quotas 


We shall next consider the case where the maximum number of treatments 
is fixed, but the decision maker may alter the quotas as well as the treatments. 
The optimum cutting score Yt between any two treatments t-l and t may be 


located by differentiating (3.6) with respect to Yt 
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aau _ Tys [24% wt) - 27 FO) 
dy; 4a CEIR 
a Sonat trey) ~ ATE y =0 (3-14) 
ate, 
Simplifying this expression, 
AE AE 
1 t t-1 
yi=sl—t + 3.15 
tojan a (3.15) 


T the intervals. 
This equation applies to any cutting point. 


With n treatments, there are 
n-l boundaries 


» and (3.15) provides n-1 simultaneous equations. No general 
solution for this set of equations has been found. 
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TWO-STAGE SEQUENTIAL STRATEGY 


Va and Yp» are available. One of these, designated Yı? will be 
Persons scoring above a certain value y'j will be accepted 
will be rejected. The 


Two tests 
given to all persons. 
at once, and persons scoring below some other value yi 
remaining persons will be given the second test and will be 
rejected on the basis of the combined information from both tests. 
to provide the desired proportion 4, of accepted 


finally accepted or 
The appro- 


priate cutting scores Yi and yi 
men are to be determined. 
Men are to be accepted for a fixed treatment t,. 
save for the introduction of a second test. Since the two 
sed to designate the standardized 
hich is independent of the first. The 
These two scores 


The assumptions of Appen- 


dix 1 will be retained, 
tests y, and Yp May be correlated, Y2 is'u 
component of the designated second test w! 


validities of the two independent scores are rT and r, . 
ye ¥2e 


may be combined into a battery score Y, using weights which maximize the 


b sat 
attery validity ry.- 


2 gg 2 
Tye = zyje + Tje (4.1) 
2 
e + *y,e%2 
Y= = (4.2) 
Ye 


Since all persons for whom Y is constant contribute equal expected payoff 
(from 1.1), the optimum cutoff for selecting among the men given the second 
test is a value Y = Y' (shown as line MN in Figure 11). We now seek the 
values of Y', yj and yy which give the greatest utility per man tested, with any 
selection ratio. 

Men in Region II of Figure 11 are accepted on the basis of the first test. 
The gain in utility for these men, over the utility from men chosen by chance, 


using (1.8) and momentarily disregarding cost, is 
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A = È 4.3 
AU E ep (4.3) 


In Region III, the analogous gain is 


X 
r. 1 -œ 
Sip a | FeFve¥ Sy 92) dy; dy, (a 
Fij Yey 


Men in regions I and IV are rejected and thus their utility is zero. 


Let You 
be the cutting score on Y2 for a specified 


Y)» and also let y and y5 represent 
the values of Y¥2,, Corresponding to yj and y'o From (4.2), 


ia 

Tye yeI 

24° (4.5) 
y2e 


Noting that (yyy) = By) Ey), (4.4) becomes 


yi r 0 
AU =o 5 
e J (ry E *y,e¥2) Hy) (ya) dy, dy, 


1 
e le ly en (9 1) + Fy eiza) §(y,)ay, (4.6) 


n 
9 


We may designate the costs of the two tests as 


Gy and cy + Since all per- 

A s 1 2 

sons are given the first test i ä 

me kii 1 and yj are given the second, 
erage cost per man teste 


Sr F Sy [Op ~- oly) (4.7) 


The net gain in utility as a result of testing, 
AU = oaile i 


+ ni 
k T 
yje e ir yye%1*(¥2 4) + Zye?) E(yy)ay, 


“ASS Sy Loy) - ey (4.8) 


The selection ratio is the combined Proportion in IĮ and III 
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a vi, 
dy E | (y)dy; + ll §(y,) (y2.1) 31 (4.9) 
n Jy; 


Since ġ (y2 D is a function of Y and y), equation (4.9) establishes a relation- 
ship between y}, yj and Y'. Therefore only two of these can be independent 
variables. To maximize AU we follow the method of Lagrange for a restrained 


maximum, introducing an undetermined multiplier \. Three simultaneous equa- 


tions define the conditions for the maximum: 


au ae 
=r tÀ =0 
ðY) ay} 


(4.10) 


Substituting the appropriate values in (4.10) and transposing, 


E t -C 
Tey e Elya) Y2 


, 
Teyi * p5) 


" 
i 
i 


“)-C 
FoF y,0 (2) % cath 


TePy ei -=i 7 


Èa 
o.Tye® = A 


Eliminating \ from equations (4.11) and noting that 1 - (x) = ġ(-x) and that 


(x) = £€(-x) gives 


Cc 
(y3) Y2 en 
! - = 
tyje + *y2e $ Y2) Te Ê Y3 Ye 
(4.12) 
&(-y3) a Y aa Hi 
Tas z 
*y,e%1 *y2e $ =y3) AIEE Ye 


i i C o 
These equations define y} and y'i respectively in terms of Tye Tye’ vo! ie? 


and Y'. 
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For computational Purposes it is advantageous to determin 
nates of points M and N, ie. 
from (4.12), 


e the Y coordi- 
YS and Y2- We use equation (4.5) to eliminate Y! 


3 Cc 
F(y5) Y2 FST 
1 - =r 
Fy” ni *ype $ y5) CoP) meL ype’ 2 


(4.13) 
E(-y5) A EAE 
T =r a 
Zye i *ye By) AIE) ye” ygere 
Hence 
Elya) Y2 
Ws = ce isa (4.14) 
Y2 > $y) Tey e My)” YA 


Thus y3 and yy are equal in absolute value and depend only on C /o r, 


Yer EIE 
The optimum Pre-reject or pre-accept Strategy could be derived separately. 
However, the fact that the Partial derivative with respect to y 


es of bat and y' 
tial Strategy. 

Son of th 
+14) 
n in 
validity of either Stage of testing c 


if is independent 


of y!! in (4.11) shows that the valu | So obtained would be identical 
L 1 


to those for the complete sequen 

In order to simplify compari; 
tageous to express (4.12) and (4 
with the total battery rather tha: 


e Various testing Strategies it is advan- 
in terms of the Correlation of each test 


terms of their validity coefficients. The 
an be written thus; 


4.15) 
*y2¥*¥e ( 


The terms r 


WY, and "YY are not independent Since 
Bee A ait (4.16) 
vx YoY 
Then (4,12) becomes 
Ely) Sy 
$, yji +r ate. 2 
wi 3) vores) = ¥! 
yy Yat eyz “eye PY} 
(4.17) 
§(-yy) $ 
E E EE y 
y yY] Yyy + = 
1 yax Ayn "eye? ~¥3) 


and for (4.14) we have 
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Ey) Syz 


e = 
2 S3) Te ye"y,Y YS (4.18) 


ye 


Using this formulation it is possible to treat cy /*.7y¢ as a single constant 
2 


C3 which is given (or can be specified hypothetically). Figure A-l based on 
(4.18) permits ready determination of y3- 


lar problem one begins with ty Y and C3» and some trial 
1 


To solve a particu 
value of Y' such that ¢(Y') is near the desired selection ratio. Then y, and y} 
and y'! are found from (4.5). 


are obtained from Figure A-1, and Y) i 
The selection ratio obtained 


Any Y' yields one pair of first-screen cutoffs. 


for the trial Y' is found from (4.9). The gain in utility is given by (4.8), but it 


is convenient to rewrite that equation in this form: 


AU yi 
mse ee (yy) + iP Fy 7192.1) + *y5¥ Hyp.) Hy) ay 
Jyt : 


Teye 
€ c ? 
Yi Y2 èy!) oly) (4.19) 
- —— Oe) - OT 4.19 
Te*Ye TeTye l 1 
3.00 
2.00 
ya 
1.00 
0 .001.005 01 05 10 20 30 40 


Cty, 


Figure A-l. Chart for determining optimal cutting scores on second test 


There is no simple procedure for solving directly from ba Successive trials 
and Y' that yield a de- 


with interpolation will determine the values of yp yi 


sired da: 
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OPTIMUM LENGTH OF A TEST 


A test is said to be lengthened homogeneously when we combine unit tests 


having the same cost, equal intercorrelations r » and the same relation to 
1 


criteria (i.e., the same Ee e): The relation of the unit test to the criterion is 
1 


assumed to be thé same regardless of the unit's position in the series. Cor- 
relation increases with length k according to this well-known equation: 


r 
YKY 
e TE ce z= e e |IF mF (oa) 
kêt 1% Yy, vee yyy 


A similar equation may be written for T. 


The cost, formerly denoted b 
and Ci 


etc.; C 


y Cy may be divided into two components, Co 
Co consists of the basic cos 


o ÍS constant as k changes. ¢ 
taking a test of unit length (supplie 


t of setting up the test, giving directions, 


1 T€Presents the costs associated with 
S, time of examiner and subjects if paid, etc»). 


This cost increases Proportionally to length. 


Hence for a test k times as long 
as the unit test, the total cost 


Fixed treatment, fixed uotas 
mer Hed quotas 


From (1.19), 


gain in utility per man tested in either selection or placement 
is 


Substituting from (5.1) and (5.2), 
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T 
Ykřk 
AU, = By a T - (Cyt kC)) (5.4) 
11 
B i is defined for fixed treatment by the value of the first term of (5.3) when 
vided AU > 0 for some value of k. 


kis 1. The function has a maximum, pro 


The maximum occurs where 


B, al 
ya 
aau 1 F 
J ret Gy = (5.5) 
This condition may be rewritten thus: 
Bo al 3 
1 Ia khi + -D 5.6 
- = =I), x 
E yyy Fey yy saa 
kd 


uation gives the desired optimum length. 


One root of this eq 
tery of v tests may be used. 


or placement a single-stage bati 


In selection 
IfR is the correlation of the battery score with payoff, 


Yer 
au ==2o0,R Ag, 7 (5.7) 
+ & Rye t y 


Since utility of the battery with the vth test removed is expressed by the same 
g the vth test is the 


-1 substituted for v, 


formula with the gain from addin; 


increment 


AU, =2¢ ag, (R = (5.8) 
v t © tue ¥y-1% v 


R + R is unlikely to be the same for all treatments. 


Yue ¥ 
v v-l 
If the problem is one of selection for a single treatment, (5.8) becomes 
2 sa - = B = 5. 
AU = (Ry e Ry e jot, 7 Sy* AR ete ~ Sy (5-9) 
yt v-lt 
gth that 


We may now consider the advisability of adding a unit test of such len: 
The battery may y by adding a 


ttery if for that unit 


its variable cost is C)- be augmented profitabl 


unit of any test already in the ba 


C 
1 (5.10) 
AR, > TE 

v Te t 
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or a unit of any test y not yet in the battery if for that unit 


Co. + CG 
ee 
Cc 


(5.11) 
e't 


Among all units available, that one should be added first which provides the 


greatest increment in utility. Horst (39) has developed general procedures 
for adjusting the length of the tests in a battery so as to maximize R 


Ye? under 
the assumption that Co is negligible. 


Adaptive placement, fixed quotas 


When treatments may be adjusted to fit the aptitude s of selected men, from 
(3.6) 


r? 
Yks A? 
AU, = fa 2 


t 


i -C¢ 


Ao Yk 


(5.12) 


Substituting from (5.1) and (5.2), 


Here By a is the benefit under adaptive treatment, 


of (5.12) when k = 1. This function has a maximum (provided AU > 0 for 
any k) where 


ee |i > 15 
yyy e [ (k My) (5.15) 


Adaptive selection, fixed quotas 


In selection, from (3.8) 
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(5.17) 
(5.18) 
(k= 
AUG, a 2 ija Yy 
ðk ` 4a ġl sfi ee z 
YI neper Ey 
L= 
E 
= © (5:19) 


b 
+ sly’) r, aE E a re 
2a? y)8 2Nk [1 + (k-11 A 
l [ ( Eyy] 


Setting this equal to zero gives an expression in k one of whose roots is the 


optimum length of test for adaptive selection. 
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STRATEGY FOR MULTI-STAGE SELECTION 


The following appendix describes the 
cal Research Group (SRG) (35, 62) 
to minimize risks associated with 
their argument to fit the personnel 

A unit test Yı 


Procedures developed by the Statisti- 
for dividing persons into two groups so as 
misclassification. 
Problem, 

is to be used at each stage of testing. 
"true score" fos which is his avera, 
of testing. Since Son 
select persons whose 


We have paraphrased 


The person has a 
ge score after an inf: 


inite number of stages 
is a predictor of criterion perfor 


! - either rejecting 
œ 


The reliability 
r sr is known; 
YY. Firs 


standard deviation ¢ of Yi 


While no serious loss 


Pearson risks, two values 
a t W, are located. a judgment is then made as to the toler- 
T Wp and the tol- 


n F W. These risks 


h) = br?/(w, - w) (6.1) 

h, = ac?/(w, = w ) (6.2) 

S (w + w,)/2 (6.3) 
The parameters a and b are functions of © and 8 which have been tabled by 
SRG 
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a = In(1-§)/a_ b= In(1- a)/ß (6.4) 


The person's score is totalled at the end of every stage of testing. If, at 


the kth stage, the score is greater than hy + kS, the person is accepted; if 


less than hy + kS, the person is rejected. If it is between these values, test- 


ing is continued. A simple graphical procedure for making these decisions is 


available. 
Operating characteristic and selection ratio. The strategy has an operat- 
, that is, the probability of 


ing characteristic function, which describes Pa/y 
œ 
This function is approximately 


accepting a person at each true quality level. 
pe) and (w3: 1 - p), and is 


ogival in shape and passes through the points (w 


described by 


8) 82 
PA/Y o =(e - Ile“ -1 (6.5) 
Here, e is the base of the natural logarithms and the parameters 8) and g3 
are defined by 
(6.6) 


g, = 2hy(S = Yo)? 8 = 2(hy + h2)(S - To” 
is assumed to be known, the 


If, as in our work, the distribution of Tig 
ants who will be accepted, or 


z Pe à r 
Py PA/y indicates the proportion of applic: 


œ œ 
the selection ratio. 


Average sample number and cost of testing. The number of tests to be 


end on his scores. Fora large number of persons 


given each person will dep 
we can predict the average number of stages of 


having the same true score, 


testing k which will be required. 


k sitt pee (6-7) 
Yoo (hy + bi) PA/y o * 5 Yq 


, and summing, gives the total number 


Multiplying by the probability of each Y o 
mits an estimate of the 


of tests. Introducing the cost of the unit test then per 


total cost of all tests. 


Calculating operations. 
permits calculation of the relation between 
and cost of testing. The program assumes thata = B» 
tion, and that ability is normally distri- 
The parameters 


m has been prepared which 


A computer progra: 
average quality of 


selection ratio, 


persons accepted, 
which is not a very restrictive assump! 
buted and error of measurement uniform for all values of Yoo 
for any given calculation are @ Ty y,’ and C}, the cost per unit test. A work- 
so introduced, but drops 0 ained. 


ts given in the text. 


is al ut of the functions obt: 


ing parameter Yò 
ployed to obtain the resuli 


This program was em 
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The program also permits a person seeking a good strategy for a partic- 


ular decision to start with a known selection ratio and cost of testing. 


Assum- 
ing a value for w 


1 = Wz and using nomograms not reproduced here, one can 
determine appropriate values of œ and Yo for these conditions, and then de- 
rive the strategy from (6.1) and (6.2). 


This strategy will probably be different 
from the true optimum obtained by Wald' 


s recursion formula, but it is expected 
to be a good approximation. 
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DISTRIBUTION OF EFFORT IN COMPOUND DECISIONS 


and a test is available relevant to 


When several decisions are to be made, 
length of each test- That 


each, a decision must be made regarding the optimum 
ting battery is to be adjusted so as to maxi- 


is to say, the composition of the tes 
each dependent 


eare W independent decisions, 
orrelated. There is a test Yg 
The unit of length for each 
ters for each test are 


mize utility. Assume that ther 


on a different aptitude so that payoffs are unc 
relevant to the others. 


The remaining parame 
depends only on the time 


bearing on decision d and ir 


test is such that its cost is C- 
B 

yyt Fy yd and Coq: We shall assume that Coq 
necessary to distribute the tes 
s of its length but can var 
gth k is 


t, give instructions, etc.» and thus is fixed for 
each test regardles y from test to test. 


For any test, as in (5.2) the cost at len 


G. a = Coat kC} (7-1) 


for the battery- Since we assume that 


We shall fix the total allowable cost Cy 
fixed total testing 


costs depend only on time, this is equivalent to assuming @ 


time. 


Fixed treatment 
As in (5.4), if treatments and quotas are fixed, when test Yq is used at length 
ka’ the gain in utility per man for that decision is 


E 
YKr 
Dkk = (Coat 4S) (7.2) 


F, 
yyvid 


put we shall consider the utility of any 


jons, there are W tests, 
pect to d, 


For the w decis 
en summi: 


battery of v tests where V <w. Th ng with res 
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g 
ZG 5 = Beg + C Sky = © (7-3) 
1 


Cr (7.4) 


It may be noted from (7.3) that the total number of items (= kg = K) which 
can be used for any battery is a constant determined by (Cr = Cog)/Cy- In 
order to maximize D AU, it is necessary again to use the method of Lagrange 
as in (4.10). With respect to any kgs as in (5.5), 


B o -2 
aBa ya yya i e 
a | CS le (a, 50 eo tables ; 
a Fe (ky urag 


Letting \ be an unspecified multiplier, and retaining the constraint of (7.3), we 
obtain a series of y equations of the form 


B (CP =g 
y,4 yya Si A wa 
zre - z 
2k [1 + (k, -1 x 1 1 = i 
a[ "a *y,y,)% 


No algebraic solution for this system of equations has been obtained. 


Special cases. We can treat the equation directly when v = 2. Then, from 
(73), 
« = See Sha? She 
—— (7.7) 
B (le. ay 
1 4 
Yı Yiya? _ Bral yay) 


2k fi -1)r j 
if + (ke, -1) vii] aie + Okay ap (7.8) 


This equation involving the sincle unknown k 


i ) indicates the optimum distribu- 
tion of effort between two tests 


» when other 


ine S that where all v tests and the 
related decisions are uniform, i.e., r B 


yg Bye and Coa are the same for 
all decisions. Then, from (7.6), ka 


is the same for every decision. From (7.3), 


Cy 
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This permits us to inquire as to the optimum number of tests to be used. (7-4) 
becomes 
= Au, = vB k 28 7.10 
a y,a] 1+ kr T G0) 
Yiyi 
Substituting for v (using (7.9)), and maximizing AU with respect to k, 
CB l-r 
azau _ Ty a! vn) 
ae ante 
2 (Cy + KC) 1 + (k-l)r Ye 
: yy 
A CaS By a 
= [Ts ie C, + kC)? (qan 
Wry ý 
Simplifying, 
A E CG, + kC,) = 2kC,]1 + k-1 (y 7.12 
yy (Co D i (e Da a 
I= Ty y Cory y 
ba-g Walia 4 fit ees) (7.13) 
"yy : yy 


d root of (7.13) is ignored 


ximum utility. The secon 
The 


This value of k yields ma: 
s no prac 


ds a negative value of k, which ha 
(7.9). 


because it yiel tical meaning- 
optimum v is determined by substitution in 


Adaptive treatment 
A similar analysis for adaptive placement leads to a simpler solution- The 
equation analogous to (7-4) is» from (5-13), 
r 
x g Ye? . 5 (7.14) 
2 = 5 r 

ZAU, = 2 Bak og E (Coat kC) 

1 1 1° yya 
The method of Lagrange leads to V equations of this form: 

(7-15) 
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Letting u = oy we fin 
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(7.16) 


(7.17) 


and recall that Zka 


= K is a constant determined by test costs, we find that 


= 1 (7.18) 
Tah T Pha 
YY 
Y v 
1 (7.19) 
Zk =K=v-5 + EuD 
and d 1 yya 1 y,d 
v 
1 
Therefore HET l zs -vtK (7.20) 
=p, , |! yyy, 
1 4 
Denoting any particular dbya', 
D 
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Key a aal reL |S Me ee (7.21) 
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Yiyi ED, 4 yyy 
1 Y) 


y set, 


may be compared by calculating = AU, after 
determine which test is most profitable. 
are uniform, from (7.14), 


lengths are optimally adjusted, to 
Special cases. If all decisions 
teers = C., (7.22) 


Substituting for Vand r yy" and differentiating, 
k 


az AU, 
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ee ae) 
Cy ne yy 
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This is the optimum length for each tes 


decisions. 
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0 if ( ERA 
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pe 


p= 
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=0 (7.24) 


Call = x ) 
0 yyy 
k= a EY (7.25) 


T 
yy 


t in uniform adaptive placement 
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